From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: SMIE Date: Thu, 10 Jul 2014 09:32:15 -0400 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1404999178 17603 80.91.229.3 (10 Jul 2014 13:32:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 10 Jul 2014 13:32:58 +0000 (UTC) Cc: emacs-devel@gnu.org To: Matt DeBoard Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jul 10 15:32:51 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1X5ESv-00023o-V6 for ged-emacs-devel@m.gmane.org; Thu, 10 Jul 2014 15:32:50 +0200 Original-Received: from localhost ([::1]:38084 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X5ESv-0004ap-Ig for ged-emacs-devel@m.gmane.org; Thu, 10 Jul 2014 09:32:49 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45838) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X5ESi-0004Xp-91 for emacs-devel@gnu.org; Thu, 10 Jul 2014 09:32:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X5ESa-000875-Oi for emacs-devel@gnu.org; Thu, 10 Jul 2014 09:32:36 -0400 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.181]:63183) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X5ESa-00085O-IN for emacs-devel@gnu.org; Thu, 10 Jul 2014 09:32:28 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArYGAIDvNVNLd+D9/2dsb2JhbABZgwaDSlK/a4EXF3SCJQEBAQECASMzIwULCxoCGA4CAhQYDSSIBAivG6J+F4EpjR4zB4JvgUkEqRmBaoNMIQ X-IPAS-Result: ArYGAIDvNVNLd+D9/2dsb2JhbABZgwaDSlK/a4EXF3SCJQEBAQECASMzIwULCxoCGA4CAhQYDSSIBAivG6J+F4EpjR4zB4JvgUkEqRmBaoNMIQ X-IronPort-AV: E=Sophos;i="4.97,753,1389762000"; d="scan'208";a="77058426" Original-Received: from 75-119-224-253.dsl.teksavvy.com (HELO pastel.home) ([75.119.224.253]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 10 Jul 2014 09:32:15 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id 85724604AF; Thu, 10 Jul 2014 09:32:15 -0400 (EDT) In-Reply-To: (Matt DeBoard's message of "Wed, 9 Jul 2014 23:53:04 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.248.154.181 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:172930 Archived-At: > In general I=E2=80=99m having a hard time connecting the dots between the= BNF > grammar table creation, the smie-rules (i.e. :before, :after, etc.), > tokenization, indentation, and so forth, and how it all comes together > to make this indentation machine work. The tokenizer turns the text (sequence of chars) into a sequence of "tokens", which should be thought of as "words". This step gets rid of things like comments and (syntactically insignificant) whitespace. The BNF then describes which of those tokens are special (people usually call them "keywords" if they look like human words and "operators" if they look like math entities) and how they relate to each other to provide structure. These two together are sufficient to get C-M-f and C-M-b to do something useful such as skip from "begin" to the matching "end" and vice-versa. C-M-f/C-M-b also do something useful when called right before/after an "infix" keyword: they skip over the corresponding right/left element. E.g. C-M-b starting from right after "then" should jump to the matching "if" (it may jump to either just before it or just after it, depending on the particular way the BNF is specified), and C-M-f from right before "+" should jump over the whole expression that's being added. It's generally very useful to debug the BNF and tokenizer using C-M-f and C-M-b: if these don't jump to the right place, then the indentation rules won't work well anyway. Note that these C-M-f and C-M-b may sometimes stop before and sometimes after the "destination keyword", depending on details of the BNF grammar, but usually either choice is fine in the sense that it's "close enough" that the indentation rules can then be made to work. The indentation algorithm then works by looking at the immediately surrounding tokens (the one before and the one after), asks the rules-function what is the indentation rule for those tokens, and when needed jumps to the "parent" with (more or less) C-M-b. > there=E2=80=99s that. I=E2=80=99m slowly starting to pare things away. Th= e bit you > wrote about :list-intro is interesting. When you say that it sees two > or more concatenated expressions, how does that tie in to the BNF > grammar definitions? When you have "exp1 exp2 exp3", there is no keyword/operator involved (tho keyword/operators may be involved inside exp[123], of course), so the BNF definition doesn't say anything about it. When the BNF says ("BEGIN" exp "END"), SMIE interprets it as "we can have any number of `exp' or any other non-special token between BEGIN and END". And that is true of all BNF rules in SMIE (SMIE always accepts any number of non-special tokens anywhere between the special tokens, and any repetition). IOW it doesn't tie in to the BNF grammar definition at all, because this is imposed implicitly for all grammars by SMIE's underlying algorithm. E.g. the grammar (defvar my-smie-bnf '((id) (exp ("BEGIN" exp "END") (id ":=3D" exp)))) will happily consider "a b c :=3D BEGIN d e END" as a valid `exp'. Actually, it will also accept "BEGIN a b c END :=3D BEGIN d e END", because as soon as you have such a "parenthesis-like" element, SMIE will also accept it anywhere (it is a really dumb parsing algorithm which pays very little attention to the context). > One final question. In the case of e.g. > > Can't resolve the precedence cycle: .do < else:. < .do > What does the placement of the dots (left of "do", right of "else:") mean? SMIE reduces the BNF grammar into a simple table of precedences. Each token gets assigned 2 precedences: a left-precedence and a right-precedence. So the "." indicates which precedence is affected: the above says that there's a cycle between the left precedence of "do" and the right precedence of "else:". A good way to think about those < and > is as "parenthese": The rule "else:. < .do" means that SMIE thinks (because of some part of the BNF grammar) that if the code looks like: ... else: .. .. do ... it should be parsed as ... else: ..(.. do ... rather than as ... else: ..).. do ... And the rule ".do < else:." (or better said "else:. > .do") says just the reverse: ... else: .. .. do ... it should be parsed as ... else: ..).. do ... rather than as ... else: ..(.. do ... So, SMIE doesn't know which is true. Sadly, the code I wrote does not keep track of where those inequalities come from, so it's unable to point you to the particular part of the BNF rules which made it think that "else:. < .do" or that ".do < else:.". > As regarding inclusion in GNU ELPA, I'm just a caretaker for the > project on behalf of the Elixir-lang people, but as it's already in > MELPA I'm sure it's fine. Inclusion into GNU ELPA is slightly different in that it's a half-way inclusion in Emacs: on the technical side, it means that there's a Git branch of the code kept along side Emacs's repository (and to which Emacs maintainers can write) from which the GNU ELPA package get built, and on the legal side it means that the code has to have the same copyright status as Emacs, which concretely means that all non-trivial contributors need to have signed some copyright paperwork. Stefan