From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: access to parser stack in SMIE Date: Sat, 06 Oct 2012 00:21:39 -0400 Message-ID: <85pq4wgrho.fsf@member.fsf.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1349497610 1369 80.91.229.3 (6 Oct 2012 04:26:50 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 6 Oct 2012 04:26:50 +0000 (UTC) To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Oct 06 06:26:56 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TKLyY-0004QT-1x for ged-emacs-devel@m.gmane.org; Sat, 06 Oct 2012 06:26:54 +0200 Original-Received: from localhost ([::1]:38448 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TKLyS-0008VD-45 for ged-emacs-devel@m.gmane.org; Sat, 06 Oct 2012 00:26:48 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:46785) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TKLyO-0008V8-Sg for emacs-devel@gnu.org; Sat, 06 Oct 2012 00:26:46 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TKLyN-0008WV-6k for emacs-devel@gnu.org; Sat, 06 Oct 2012 00:26:44 -0400 Original-Received: from qmta05.westchester.pa.mail.comcast.net ([76.96.62.48]:57535) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TKLyN-0008WR-1D for emacs-devel@gnu.org; Sat, 06 Oct 2012 00:26:43 -0400 Original-Received: from omta23.westchester.pa.mail.comcast.net ([76.96.62.74]) by qmta05.westchester.pa.mail.comcast.net with comcast id 7gD01k0031c6gX855gSm9H; Sat, 06 Oct 2012 04:26:46 +0000 Original-Received: from TAKVER ([69.140.67.196]) by omta23.westchester.pa.mail.comcast.net with comcast id 7gM61k00Z4E4Fsd3jgM63b; Sat, 06 Oct 2012 04:21:06 +0000 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 76.96.62.48 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:154113 Archived-At: I hit a major snag in the new Emacs Ada mode indentation engine, that I resolved by allowing access to the stack SMIE uses in smie-next-sexp. See the patch below; it was made from smie.el in Emacs 24.2.1. It replaces the local variable `levels' (which contains the stack) with a global variable `smie--levels', so ada-indent-forward-token can see the stack contents. The problem I had is that it is not possible to refine the Ada "begin" keyword using only local information. "begin" is used in two ways: as the _start_ of a block, and as the _divider_ between declarations and statements in a block: function F1 is begin -- divider (refined to "begin-body") begin -- block start (refined to "begin-open") end; end; These two uses must be different tokens in SMIE; "begin-open" is an opener, while "begin-body" is neither opener nor closer. The only way to figure out which role "begin" is playing is to parse from the start of the compilation unit. Consider a package body: package body Pack_1 is function F1 is begin -- divider begin -- block start end; begin -- block start end; end; begin -- divider end; Here I've deliberately got the indentation wrong at the end, to emphasize the ambiguity. If we just look back or forward a few keywords from each "begin", we can't tell which role it is playing. In particular, the package level "begin" just looks like it follows a bunch of statements/declarations. We must go all the way back to "package" to sort it out. When all the tokens are properly disambiguated, SMIE can traverse correctly from package "begin" to "package". But we can't do that while disambiguating "begin"; that's circular. The solution I found is to deliberately call `smie-forward-sexp' starting at the beginning of the buffer. Then in ada-indent-forward-token, when a "begin" is encountered, examine the parser stack. We only have to look at a few tokens on the stack to determine whether we've got a "divider" use or a "start" use. Of course, parsing forward from the start of the buffer for each "begin" is painfully slow (I tried that), so we also need a cache. I store the refined keyword as a text property on each keyword; ada-indent-forward-token and ada-indent-backward-token check for that text property before calling the refine function. The cache is invalidated when editing occurs before the keyword that has cached information; this is handled by storing the maximum valid cache position in `ada-indent-cache-max'; it is moved forward when a keyword is refined, and backward by `ada-indent-after-change' when the buffer is changed. I use the cache for all keywords, since the mechanism is simple, and it does provide some speed-up (some of the ada-indent-refine-* functions are quite lengthy). The full code for my current version is at http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0 That includes the patched version of smie.el. At the moment, only the function ada-indent-refine-begin uses the "parse from the beginning" approach, but there are other keywords that need it; I plan to factor that out soon. I doubt that we really want to make the parser stack directly accessible to smie clients; a copy would be safer. I implemented it this way because it was the smallest change that let me test the idea. On the other hand, there may be times when we'd like to add a token to the stack, to handle broken source code, for example. Also, the name of the global variable could be better; perhaps `smie-parser-stack'. It might also make sense to incorporate the refined keyword cache mechanism into smie. -- -- Stephe --- smie.el +++ smie.el @@ -672,6 +672,9 @@ it should move backward to the beginning ;; has to be careful to distinguish those different cases. (eq (smie-op-left toklevels) (smie-op-right toklevels))) +(defvar smie--levels nil + "Parser token stack lambda-bound in smie-next-sexp; `next-token' can examine stack to help refine.") + (defun smie-next-sexp (next-token next-sexp op-forw op-back halfsexp) "Skip over one sexp. NEXT-TOKEN is a function of no argument that moves forward by one @@ -691,7 +694,7 @@ Possible return values: (nil POS TOKEN): we skipped over a paren-like pair. nil: we skipped over an identifier, matched parentheses, ..." (catch 'return - (let ((levels + (let ((smie--levels (if (stringp halfsexp) (prog1 (list (cdr (assoc halfsexp smie-grammar))) (setq halfsexp nil))))) @@ -718,32 +721,32 @@ Possible return values: ;; A token like a paren-close. (assert (numberp ; Otherwise, why mention it in smie-grammar. (funcall op-forw toklevels))) - (push toklevels levels)) + (push toklevels smie--levels)) (t - (while (and levels (< (funcall op-back toklevels) - (funcall op-forw (car levels)))) - (setq levels (cdr levels))) + (while (and smie--levels (< (funcall op-back toklevels) + (funcall op-forw (car smie--levels)))) + (setq smie--levels (cdr smie--levels))) (cond - ((null levels) + ((null smie--levels) (if (and halfsexp (numberp (funcall op-forw toklevels))) - (push toklevels levels) + (push toklevels smie--levels) (throw 'return (prog1 (list (or (car toklevels) t) (point) token) (goto-char pos))))) (t - (let ((lastlevels levels)) - (if (and levels (= (funcall op-back toklevels) - (funcall op-forw (car levels)))) - (setq levels (cdr levels))) + (let ((lastlevels smie--levels)) + (if (and smie--levels (= (funcall op-back toklevels) + (funcall op-forw (car smie--levels)))) + (setq smie--levels (cdr smie--levels))) ;; We may have found a match for the previously pending ;; operator. Is this the end? (cond ;; Keep looking as long as we haven't matched the ;; topmost operator. - (levels + (smie--levels (cond ((numberp (funcall op-forw toklevels)) - (push toklevels levels)) + (push toklevels smie--levels)) ;; FIXME: For some languages, we can express the grammar ;; OK, but next-sexp doesn't stop where we'd want it to. ;; E.g. in SML, we'd want to stop right in front of @@ -754,8 +757,8 @@ Possible return values: ;; ;; ((and (functionp (cadr (funcall op-forw toklevels))) ;; (funcall (cadr (funcall op-forw toklevels)) - ;; levels)) - ;; (setq levels nil)) + ;; smie--levels)) + ;; (setq smie--levels nil)) )) ;; We matched the topmost operator. If the new operator ;; is the last in the corresponding BNF rule, we're done. @@ -766,7 +769,7 @@ Possible return values: ;; and is not associative, it's one of the inner operators ;; (like the "in" in "let .. in .. end"), so keep looking. ((not (smie--associative-p toklevels)) - (push toklevels levels)) + (push toklevels smie--levels)) ;; The new operator is associative. Two cases: ;; - it's really just an associative operator (like + or ;) ;; in which case we should have stopped right before. @@ -778,8 +781,8 @@ Possible return values: ;; - it's an associative operator within a larger construct ;; (e.g. an "elsif"), so we should just ignore it and keep ;; looking for the closing element. - (t (setq levels lastlevels)))))))) - levels) + (t (setq smie--levels lastlevels)))))))) + smie--levels) (setq halfsexp nil)))))