unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* SMIE documentation
@ 2010-11-28 20:36 Stefan Monnier
  2010-11-28 21:56 ` Štěpán Němec
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Stefan Monnier @ 2010-11-28 20:36 UTC (permalink / raw)
  To: emacs-devel

While Savannah is down, maybe someone will feel like checking my attempt
at documenting SMIE.  See patch below.  I intend to install it on the
emacs-23 branch, in case it matters.


        Stefan


=== modified file 'doc/lispref/modes.texi'
--- doc/lispref/modes.texi	2010-08-22 19:30:26 +0000
+++ doc/lispref/modes.texi	2010-11-28 20:30:57 +0000
@@ -27,6 +27,7 @@
 * Imenu::              How a mode can provide a menu
                          of definitions in the buffer.
 * Font Lock Mode::     How modes can highlight text according to syntax.
+* Auto-Indentation::            How to teach Emacs to indent for a major mode.
 * Desktop Save Mode::  How modes can have buffer state saved between
                          Emacs sessions.
 @end menu
@@ -333,7 +333,7 @@
 programming language, indentation of text according to structure is
 probably useful.  So the mode should set @code{indent-line-function}
 to a suitable function, and probably customize other variables
-for indentation.
+for indentation.  @xref{Auto-Indentation}.
 
 @item
 @cindex keymaps in modes
@@ -3223,6 +3215,651 @@
 reasonably fast.
 @end defvar
 
+@node Auto-Indentation
+@section Auto-indention of code
+
+For programming languages, an important feature of a major mode is to
+provide automatic indentation.  This is controlled in Emacs by
+@code{indent-line-function} (@pxref{Mode-Specific Indent}).
+Writing a good indentation function can be difficult and to a large
+extent it is still a black art.
+
+Many major mode authors will start by writing a simple indentation
+function that works for simple cases, for example by comparing with the
+indentation of the previous text line.  For most programming languages
+that are not really line-based, this tends to scale very poorly:
+improving such a function to let it handle more diverse situations tends
+to become more and more difficult, resulting in the end with a large,
+complex, unmaintainable indentation function which nobody dares to touch.
+
+A good indentation function will usually need to actually parse the
+text, according to the syntax of the language.  Luckily, it is not
+necessary to parse the text in as much detail as would be needed
+for a compiler, but on the other hand, the parser embedded in the
+indentation code will want to be somewhat friendly to syntactically
+incorrect code.
+
+Good maintainable indentation functions usually fall into 2 categories:
+either parsing forward from some ``safe'' starting point until the
+position of interest, or parsing backward from the position of interest.
+Neither of the two is a clearly better choice than the other: parsing
+backward is often more difficult than parsing forward because
+programming languages's syntax is designed to be parsed forward, but for
+the purpose of indentation it has the advantage of not needing to
+guess a ``safe'' starting point, and it generally enjoys the property
+that only a minimum of text will be analyzed to decide of the
+indentation of a line, so indentation will tend to be unaffected by
+syntax errors in some earlier unrelated piece of code.  Parsing forward
+on the other hand is usually easier and has the advantage of making it
+possible to reindent efficiently a whole region at a time,
+with a single parse.
+
+Rather than write your own indentation function from scratch, it is
+often preferable to try and reuse some existing ones or to rely
+on a generic indentation engine.  There are sadly few such
+engines.  The CC-mode indentation code (used for C, C++, Java, Awk
+and a few other such modes) has been made more generic over the years,
+so if your language seems somewhat similar to one of those languages,
+you might try to use that engine.  @c FIXME: documentation?
+Another one is SMIE which tries to take an approach in the spirit
+of Lisp sexps and adapt it to non-Lisp languages.
+
+@menu
+* SMIE::                        A simple minded indentation engine
+@end menu
+
+@node SMIE
+@subsection Simple Minded Indentation Engine
+
+SMIE is a package that provides a generic navigation and indentation
+engine.  Based on a very simple parser using an ``operator
+precedence grammar'', it lets major modes extend the sexp-based
+navigation to non-Lisp languages as well as provide a simple to use but
+reliable auto-indentation.
+
+Operator precedence grammar is a very primitive technology for parsing
+compared to some of the more common techniques used in compilers.
+It has the following characteristics: algorithmically efficient, its
+parsing power is very limited, and it is largely unable to detect syntax
+errors, but it has the advantage of being able to parse forward just as
+well as backward.  In practice that means that we can use it for
+indentation based on backward parsing, that it can provide both
+@code{forward-sexp} and @code{backward-sexp} functionality, and also
+that it will naturally work on syntactically incorrect code without any
+extra effort; but it also means that most programming languages cannot
+be parsed correctly, at least not without resorting to some
+special tricks.
+
+@menu
+* SMIE setup::                  SMIE setup and features
+* Operator Precedence Grammars::  A very simple parsing technique
+* SMIE Grammar::                Defining the grammar of a language
+* SMIE Lexer::                  Defining what are tokens
+* SMIE Tricks::                 Working around the parser's limitations
+* SMIE Indentation::            Specifying indentation rules
+* SMIE Indentation Helpers::    Helper functions for indentation rules
+* SMIE Indentation Example::    Sample indentation rules
+@end menu
+
+@node SMIE setup
+@subsubsection SMIE Setup and Features
+
+SMIE is meant to be a one-stop shop for structural navigation and
+various other features which rely on the syntactic structure of code,
+mainly automatic indentation.  The main entry point is @code{smie-setup}
+which is a function typically called while setting up a major mode.
+
+@defun smie-setup grammar rules-function &rest keywords
+Setup SMIE navigation and indentation.
+@var{grammar} is a grammar table generated by @code{smie-prec2->grammar}.
+@var{rules-function} is a set of indentation rules for use on
+@code{smie-rules-function}.
+@var{keywords} are additional arguments, which can use the following keywords:
+@itemize
+@item
+@code{:forward-token} @var{fun}: Specify the forward lexer to use.
+@item
+@code{:backward-token} @var{fun}: Specify the backward lexer to use.
+@end itemize
+@end defun
+
+Calling this function is sufficient to make commands such as
+@code{forward-sexp}, @code{backward-sexp}, and @code{transpose-sexps}
+be able to properly handle structural elements other than just the paired
+parentheses already handled by syntax tables.  E.g. if the provided
+grammar is precise enough, @code{transpose-sexps} can correctly
+transpose the two arguments of a @code{+} operator, taking into account
+the precedence rules of the language.
+
+It is also sufficient to make TAB indentation work in the expected way,
+and provides some commands you can bind in the major mode keymap.
+Other packages could also use the data it sets up to provide
+structure-aware alignment or make @code{blink-matching-paren} work for
+things like @code{begin...end}.
+
+@deffn Command smie-close-block
+This command closes the most recently opened (and not yet closed) block.
+@end deffn
+
+@deffn Command smie-down-list &optional arg
+This command is like @code{down-list} but also pays attention to nesting
+of tokens other than parentheses, such as @code{begin...end}.
+@end deffn
+
+@node Operator Precedence Grammars
+@subsubsection Operator Precedence Grammars
+
+SMIE's precedence grammars simply give to each token a pair of
+precedences: the left-precedence and the right-precedence.  We say
+@code{T1 < T2} if the right-precedence of token @code{T1} is less than
+the left-precedence of token @code{T2}.  A good way to read this
+@code{<} is as a kind of parenthesis: if we find @code{... T1 something
+T2 ...}  then it should be parsed as @code{... T1 (something T2 ...}
+rather than as @code{... T1 something) T2 ...} which would be the case
+if we had @code{T1 > T2}.  If we have @code{T1 = T2}, it means that
+token T2 follows token T1 in the same syntactic construction, so
+typically we have @code{"begin" = "end"}.  Such pairs of precedences are
+sufficient to express left-associativity or right-associativity of infix
+operators, nesting of tokens like parentheses and many other cases.
+
+@c @defvar smie-grammar
+@c This variable is an alist specifying the left and right precedence of
+@c each token.  It is meant to be initialized with the use of one of the
+@c functions below.
+@c @end defvar
+
+@defun smie-prec2->grammar table
+This function takes a @emph{prec2} grammar @var{table} and returns an
+alist suitable for use in @code{smie-setup}.  The @emph{prec2}
+@var{table} is itself meant to be built by one of the functions below.
+@end defun
+
+@defun smie-merge-prec2s &rest tables
+This function takes several @emph{prec2} @var{tables} and merges them
+into a new @emph{prec2} table.
+@end defun
+
+@defun smie-precs->prec2 precs
+This function builds a @emph{prec2} table from a table of precedences
+@var{precs}.  @var{precs} should be a list, sorted by precedence (for
+example @code{"+"} will come before @code{"*"}), of elements of the form
+@code{(@var{assoc} @var{op} ...)}, where @var{op} are tokens that
+act as operators and @var{assoc} is their associativity which can be
+either @code{left}, @code{right}, @code{assoc}, or @code{nonassoc}.
+All operators in one of those elements share the same precedence level
+and associativity.
+@end defun
+
+@defun smie-bnf->prec2 bnf &rest resolvers
+This function lets you specify the grammar using a BNF notation.
+It takes a @var{bnf} description of the grammar along with a set of
+conflict resolution rules @var{resolvers} and
+returns a @emph{prec2} table.
+
+@var{bnf} is a list of nonterminal definitions of the form
+@code{(@var{nonterm} @var{rhs1} @var{rhs2} ...)} where each @var{rhs}
+is a (non-empty) list of terminals (aka tokens) or non-terminals.
+
+Not all grammars are accepted:
+@itemize
+@item
+An @var{rhs} cannot be an empty list (this is not needed, since SMIE
+allows all non-terminals to match the empty string anyway).
+@item
+An @var{rhs} cannot have 2 consecutive non-terminals: between each
+non-terminal needs to be a terminal (aka token).  This is a fundamental
+limitation of the parsing technology used (operator precedence grammar).
+@end itemize
+
+Additionally, conflicts can occur:
+@itemize
+@item
+The returned @emph{prec2} table holds constraints between pairs of token, and
+for any given pair only one constraint can be present, either: T1 < T2,
+T1 = T2, or T1 > T2.
+@item
+A token can either be an @code{opener} (something similar to an open-paren),
+a @code{closer} (like a close-paren), or @code{neither} of the two
+(e.g. an infix operator, or an inner token like @code{"else"}).
+@end itemize
+
+Precedence conflicts can be resolved via @var{resolvers}, which is a list of
+@emph{precs} tables (see @code{smie-precs->prec2}): for each precedence
+conflicts, if those @code{precs} tables specify a particular constraint,
+then this constraint is used instead of reporting a conflict.
+@end defun
+
+@node SMIE Grammar
+@subsubsection Defining the Grammar of a Language
+
+The usual way to define the SMIE grammar of a language is by
+defining a new global variable holding the precedence table by
+giving a set of BNF rules.
+For example:
+@example
+@group
+(require 'smie)
+(defvar sample-smie-grammar
+  (smie-prec2->grammar
+   (smie-bnf->prec2
+@end group
+@group
+    '((id)
+      (inst ("begin" insts "end")
+            ("if" exp "then" inst "else" inst)
+            (id ":=" exp)
+            (exp))
+      (insts (insts ";" insts) (inst))
+      (exp (exp "+" exp)
+           (exp "*" exp)
+           ("(" exps ")"))
+      (exps (exps "," exps) (exp)))
+@end group
+@group
+    '((assoc ";"))
+    '((assoc ","))
+    '((assoc "+") (assoc "*")))))
+@end group
+@end example
+
+@noindent
+A few things to note:
+
+@itemize
+@item
+The above grammar did not explicitly mention the syntax of function
+calls: SMIE will automatically allow any sequence of sexps, such as
+identifiers, balanced parentheses, or @code{begin ... end} blocks
+to appear anywhere anyway.
+@item
+The grammar category @code{id} has no right hand side: this does not
+mean that it can only match the empty string, since as mentioned any
+sequence of sexp can appear anywhere anyway.
+@item
+Because non terminals cannot appear consecutively in the BNF grammar, it
+is difficult to correctly handle tokens that act as terminators, so the
+above grammar treats @code{";"} as a statement @emph{separator} instead,
+which SMIE can handle very well.
+@item
+Separators used in sequences (such as @code{","} and @code{";"} above)
+are best defined with BNF rules such as @code{(foo (foo "separator" foo) ...)}
+which generate precedence conflicts which are then resolved by giving
+them an explicit @code{(assoc "separator")}.
+@item
+The @code{("(" exps ")")} rule was not needed to pair up parens, since
+SMIE will pair up any chars that are marked as having paren syntax in
+the syntax table.  What this rule does instead (together with the
+definition of @code{exps}) is to make it clear that @code{","} will not
+appear outside of parentheses.
+@item
+Rather than have a single @emph{precs} table to resolve conflicts, it is
+preferable to use several tables, so as to let the BNF part of the
+grammar specify relative precedences where possible.
+@item
+Unless there is a very good reason to prefer @code{left} or
+@code{right}, it is usually preferable to mark operators as associative
+with @code{assoc}.  For that reason @code{"+"} and @code{"*"} are
+defined above as @code{assoc}, although the language defines them
+formally as left associative.
+@end itemize
+
+@node SMIE Lexer
+@subsubsection Defining What is a Token
+
+SMIE comes with a predefined lexical analyzer which uses syntax tables
+in the following way: any sequence of chars that have word or symbol
+syntax is considered as a token, and so is any sequence of chars that
+have punctuation syntax.  This is often a good starting point but is
+rarely actually correct for any given language.  For example, it will
+consider @code{"2,+3"} as being composed of 3 tokens: @code{"2"},
+@code{",+"}, and @code{"3"}.
+
+To describe the lexing rules of your language to SMIE, you will need
+2 functions, one to fetch the next token, and another to fetch the
+previous token.  Those functions will usually first skip whitespace and
+comments and then look at the next chunk of text to see if it
+is a special token, if so it should skip it and return a description of
+this token.  Usually this is simply the string extracted from the
+buffer, but this is not necessarily the case.
+For example:
+@example
+@group
+(defvar sample-keywords-regexp
+  (regexp-opt '("+" "*" "," ";" ">" ">=" "<" "<=" ":=" "=")))
+@end group
+@group
+(defun sample-smie-forward-token ()
+  (forward-comment (point-max))
+  (cond
+   ((looking-at sample-keywords-regexp)
+    (goto-char (match-end 0))
+    (match-string-no-properties 0))
+   (t (buffer-substring-no-properties
+       (point)
+       (progn (skip-syntax-forward "w_")
+              (point))))))
+@end group
+@group
+(defun sample-smie-backward-token ()
+  (forward-comment (- (point)))
+  (cond
+   ((looking-back sample-keywords-regexp (- (point) 2) t)
+    (goto-char (match-beginning 0))
+    (match-string-no-properties 0))
+   (t (buffer-substring-no-properties
+       (point)
+       (progn (skip-syntax-backward "w_")
+              (point))))))
+@end group
+@end example
+
+Notice how those lexers return the empty string when in front of
+parentheses.  This is because SMIE will automatically take care of the
+parentheses defined in the syntax table.  More specifically if the lexer
+returns nil or an empty string, SMIE will try to handle the corresponding
+text as an sexp according to syntax tables.
+
+@node SMIE Tricks
+@subsubsection Living With a Weak Parser
+
+The parsing technique used by SMIE does not allow tokens to behave
+different in different contexts.  For most programming languages, this
+will manifest itself by precedence conflicts when converting the BNF
+grammar.
+
+Sometimes, those conflicts can be worked around by expressing
+the grammar slightly differently.  For example for Modula-2 it might
+seem natural to have a BNF grammar that looks like:
+
+@example
+  ...
+  (inst ("IF" exp "THEN" insts "ELSE" insts "END")
+        ("CASE" exp "OF" cases "END")
+        ...)
+  (cases (cases "|" cases) (caselabel ":" insts) ("ELSE" insts))
+  ...
+@end example
+
+But this will create conflicts for @code{"ELSE"} since the IF rule
+implies (among many other things) that @code{"ELSE" = "END"}, but on the
+other hand, since @code{"ELSE"} appears within @code{cases} which
+appears left of @code{"END"}, we also have @code{"ELSE" > "END"}.
+We can solve it either by using:
+@example
+  ...
+  (inst ("IF" exp "THEN" insts "ELSE" insts "END")
+        ("CASE" exp "OF" cases "END")
+        ("CASE" exp "OF" cases "ELSE" insts "END")
+        ...)
+  (cases (cases "|" cases) (caselabel ":" insts))
+  ...
+@end example
+or
+@example
+  ...
+  (inst ("IF" exp "THEN" else "END")
+        ("CASE" exp "OF" cases "END")
+        ...)
+  (else (insts "ELSE" insts))
+  (cases (cases "|" cases) (caselabel ":" insts) (else))
+  ...
+@end example
+
+Some other times, after careful consideration you may conclude that
+those conflicts are not serious and simply resolve them via the
+@var{resolvers} argument of @code{smie-bnf->prec2}.  This is typically
+the case for separators and associative infix operators where we
+add a resolver like @code{'((assoc "|"))}.  Another case where this can
+happen is for the classic @emph{dangling else} problem where we will use
+@code{'((assoc "else" "then"))}.  It can also happen for cases where the
+conflict is real and cannot really be resolved, but it is unlikely to
+pose problem in practice.
+
+Finally, in many cases some conflicts will remain despite all efforts to
+restructure the grammar.  Do not despair: while the parser cannot be
+made more clever, you can make the lexer as smart as you want.  So, the
+solution is then to look at the tokens involved in the conflict and to
+split one of those tokens into 2 (or more) different tokens.  E.g. if
+the grammar needs to distinguish between two incompatible uses of the
+token @code{"begin"}, make the lexer return different tokens (say
+@code{"begin-fun"} and @code{"begin-plain"}) depending on what kind of
+@code{"begin"} it finds.  This pushes the work of distinguishing the
+different cases to the lexer, which will thus have to look at the
+surrounding text to find ad-hoc clues.
+
+@node SMIE Indentation
+@subsubsection Specifying Indentation Rules
+
+Based on the provided grammar, SMIE will be able to provide automatic
+indentation without any extra effort.  But in practice, this default
+indentation style will probably not be good enough.  You will want to
+tweak it in many different cases.
+
+SMIE indentation is based on the idea that indentation rules should be
+as local as possible.  To this end, it relies on the idea of
+@emph{virtual} indentation, which is the indentation that a particular
+program point would have if it were at the beginning of a line.
+Of course, if that program point is indeed at the beginning of a line,
+the virtual indentation of that point is its current indentation.
+But if not, then SMIE will use the indentation algorithm to compute the
+virtual indentation of that point.  Now in practice, the virtual
+indentation of a program point does not have to be identical to the
+indentation it would have if we inserted a newline before it.  To see
+how this works, the SMIE rule for indentation
+after a @code{@{} in C will not care whether the @code{@{} is standing
+on a line of its own or is at the end of the preceding line.  Instead,
+these different cases will be handled in the indentation rule that
+decides how to indent before a @code{@{}.
+
+An other important concept is the notion of @emph{parent}: The
+@emph{parent} of a token, is the head token of the most closely
+enclosing syntactic construct.  For example, the parent of an
+@code{else} will be the @code{if} to which it belongs, and the parent of
+an @code{if}, in turn, will be the lead token of the surrounding
+construct.  The command @code{backward-sexp} will jump from a token to
+its parent, but with some caveats: for @emph{openers} (tokens which
+start a construct, like @code{if}) you need to start before, while for
+others you need to start with point after the token; and
+@code{backward-sexp} will stop with point before the parent token if
+that is the @emph{opener} of the token of interest and otherwise it will
+stop with point after the parent token.
+
+SMIE indentation rules are specified with a function that takes two
+arguments @var{method} and @var{arg} where the meaning of @var{arg} and the
+expected return value depends on @var{method}.
+
+@var{method} can be:
+@itemize
+@item
+@code{:after}, in which case @var{arg} is a token and the function
+should return the @var{offset} to use for indentation after @var{arg}.
+@item
+@code{:before}, in which case @var{arg} is a token and the function
+should return the @var{offset} to use to indent @var{arg} itself.
+@item
+@code{:elem}, in which case the function should return either the offset
+to use to indent function arguments (if @var{arg} is the symbol
+@code{arg}) or the basic indentation step (if @var{arg} is the symbol
+@code{basic}).
+@item
+@code{:list-intro}, in which case @var{arg} is a token and the function
+should return non-nil if the token is followed by a list of expressions
+(not separated by any token) rather than an expression.
+@end itemize
+
+When @var{arg} is a token, the function is called with point just before
+that token.  A return value of nil always means to fallback on the
+default behavior, so the function should return nil for arguments it
+does not expect.
+
+@var{offset} can be:
+@itemize
+@item
+@code{nil}: use the default indentation rule.
+@item
+@code{(column . @var{column})}: indent to column @var{column}.
+@item
+@var{number}: offset by @var{number}, relative to a base token which is
+the current token for @code{:after} and its parent for @code{:before}.
+@end itemize
+
+@node SMIE Indentation Helpers
+@subsubsection Helper Functions for Indentation Rules
+
+SMIE provides various functions designed specifically for use in the
+indentation rules function (several of those function will break if used
+in another context).  These functions all start with the prefix
+@code{smie-rule-}.
+
+@defun smie-rule-bolp
+Return non-@code{nil} if the current token is the first on the line.
+@end defun
+
+@defun smie-rule-hanging-p
+Return non-@code{nil} if the current token is @emph{hanging}.
+A token is @emph{hanging} if it is at the last token on the line
+and if it is preceded by other tokens: a lone token on a line is not
+considered as hanging.
+@end defun
+
+@defun smie-rule-next-p &rest tokens
+Return non-@code{nil} if the next token is among @var{tokens}.
+@end defun
+
+@defun smie-rule-prev-p &rest tokens
+Return non-@code{nil} if the previous token is among @var{tokens}.
+@end defun
+
+@defun smie-rule-parent-p &rest parents
+Return non-@code{nil} if the current token's parent is among @var{parents}.
+@end defun
+
+@defun smie-rule-sibling-p
+Return non-nil if the parent is actually a sibling.
+This is the case for example when the parent of a @code{","} is just the
+previous @code{","}.
+@end defun
+
+@defun smie-rule-parent &optional offset
+Return the proper offset to align with the parent.
+If non-@code{nil}, @var{offset} should be an integer giving an
+additional offset to apply.
+@end defun
+
+@defun smie-rule-separator method
+Indent current token as a @emph{separator}.
+
+By @emph{separator}, we mean here a token whose sole purpose is to
+separate various elements within some enclosing syntactic construct, and
+which does not have any semantic significance in itself (i.e. it would
+typically no exist as a node in an abstract syntax tree).
+
+Such a token is expected to have an associative syntax and be closely
+tied to its syntactic parent.  Typical examples are @code{","} in lists
+of arguments (enclosed inside parentheses), or @code{";"} in sequences
+of instructions (enclosed in a @code{@{...@}} or @code{begin...end}
+block).
+
+@var{method} should be the method name that was passed to
+`smie-rules-function'.
+@end defun
+
+@node SMIE Indentation Example
+@subsubsection Sample Indentation Rules
+
+Here is an example of an indentation function:
+
+@example
+(defun sample-smie-rules (kind token)
+  (case kind
+    (:elem (case token
+             (basic sample-indent-basic)))
+    (:after
+     (cond
+      ((equal token ",") (smie-rule-separator kind))
+      ((equal token ":=") sample-indent-basic)))
+    (:before
+     (cond
+      ((equal token ",") (smie-rule-separator kind))
+      ((member token '("begin" "(" "@{"))
+       (if (smie-rule-hanging-p) (smie-rule-parent)))
+      ((equal token "if")
+       (and (not (smie-rule-bolp)) (smie-rule-prev-p "else")
+            (smie-rule-parent)))))))
+@end example
+
+@noindent
+A few things to note:
+
+@itemize
+@item
+The first case indicates what is the basic indentation increment
+to use.  If @code{sample-indent-basic} is nil, then it defaults to the
+global setting @code{smie-indent-basic}.  The major mode could have set
+@code{smie-indent-basic} buffer-locally instead, but that is discouraged.
+
+@item
+The two (identical) rules for the token @code{","} make SMIE try to be
+more clever when the comma separator is placed at the beginning of
+lines; more specifically, it tries to outdent the separator so as to
+align the code after the comma; for example:
+
+@example
+x = longfunctionname (
+        arg1
+      , arg2
+    );
+@end example
+
+@item
+The rule for indentation after @code{":="} is there because otherwise
+SMIE would treat @code{":="} as an infix operator and would align the
+right argument with the left one.
+
+@item
+The rule for indentation before @code{"begin"} is an example of the use
+of virtual indentation: this rule is only used when @code{"begin"} is
+hanging, which can only happen when it is not at the beginning
+of a line, so clearly this is not used when indenting @code{"begin"}
+itself but only when indenting something relative to this
+@code{"begin"}.  Concretely, this rule changes the indentation from:
+
+@example
+    if x > 0 then begin
+            dosomething(x);
+        end
+@end example
+to
+@example
+    if x > 0 then begin
+        dosomething(x);
+    end
+@end example
+
+@item
+The rule for indentation before @code{"if"} is a similar example, where
+the purpose is to treat @code{"else if"} as a single unit, so as to
+align a sequence of tests rather than indent each test further to the
+right.  This function chose to only do this in the case where the
+@code{"if"} is not placed on a separate line, hence the
+@code{smie-rule-bolp} test.
+
+If we know that the @code{"else"} is always aligned with its @code{"if"}
+and is always at the beginning of a line, we can use a more efficient
+rule:
+@example
+((equal token "if")
+ (and (not (smie-rule-bolp)) (smie-rule-prev-p "else")
+      (save-excursion
+        (sample-smie-backward-token)  ;Jump before the "else".
+        (cons 'column (current-column)))))
+@end example
+
+The advantage of this formulation is that it will reuse the indentation
+of the previous @code{"else"}, rather than having to go all the way back
+to the first @code{"if"} of the sequence.
+@end itemize
+
 @node Desktop Save Mode
 @section Desktop Save Mode
 @cindex desktop save mode
@@ -3276,5 +3913,7 @@
 @end defvar
 
 @ignore
-   arch-tag: 4c7bff41-36e6-4da6-9e7f-9b9289e27c8e
+   Local Variables:
+   fill-column: 72
+   End:
 @end ignore

=== modified file 'doc/lispref/text.texi'
--- doc/lispref/text.texi	2010-11-21 18:07:47 +0000
+++ doc/lispref/text.texi	2010-11-25 20:54:41 +0000
@@ -2205,11 +2205,11 @@
 @defvar indent-line-function
 This variable's value is the function to be used by @key{TAB} (and
 various commands) to indent the current line.  The command
-@code{indent-according-to-mode} does no more than call this function.
+@code{indent-according-to-mode} does little more than call this function.
 
 In Lisp mode, the value is the symbol @code{lisp-indent-line}; in C
 mode, @code{c-indent-line}; in Fortran mode, @code{fortran-indent-line}.
-The default value is @code{indent-relative}.
+The default value is @code{indent-relative}.  @xref{Auto-Indentation}.
 @end defvar
 
 @deffn Command indent-according-to-mode




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-11-28 20:36 SMIE documentation Stefan Monnier
@ 2010-11-28 21:56 ` Štěpán Němec
  2010-12-04 18:01   ` Stefan Monnier
  2010-11-29 17:54 ` Chong Yidong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Štěpán Němec @ 2010-11-28 21:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> While Savannah is down, maybe someone will feel like checking my attempt
> at documenting SMIE.  See patch below.  I intend to install it on the
> emacs-23 branch, in case it matters.

[...]

Thank you. A few nits I noticed:

> +@c @defvar smie-grammar
> +@c This variable is an alist specifying the left and right precedence of
> +@c each token.  It is meant to be initialized with the use of one of the
> +@c functions below.
> +@c @end defvar

Why is this commented out?

[...]

> +The returned @emph{prec2} table holds constraints between pairs of token, and
                                                                      ^^^^^
                                                                      tokens
> +for any given pair only one constraint can be present, either: T1 < T2,
> +T1 = T2, or T1 > T2.

[...]

> +returns nil or an empty string, SMIE will try to handle the corresponding
> +text as an sexp according to syntax tables.
           ^^
           a

[...]

> +@code{'((assoc "else" "then"))}.  It can also happen for cases where the
> +conflict is real and cannot really be resolved, but it is unlikely to
> +pose problem in practice.
   ^^^^^^^^^^^^
   problems?

[...]

> +An other important concept is the notion of @emph{parent}: The
   ^^^^^^^^
   another
> +@emph{parent} of a token, is the head token of the most closely
> +enclosing syntactic construct.  For example, the parent of an

What about "nearest enclosing" instead of "most closely enclosing"?

[...]

> +SMIE provides various functions designed specifically for use in the
> +indentation rules function (several of those function will break if used
                                                ^^^^^^^^
                                                functions
> +in another context).  These functions all start with the prefix
> +@code{smie-rule-}.

[...]

> +@defun smie-rule-hanging-p
> +Return non-@code{nil} if the current token is @emph{hanging}.
> +A token is @emph{hanging} if it is at the last token on the line
                                      ^^^
                                      [delete]
> +and if it is preceded by other tokens: a lone token on a line is not

[...]

> +By @emph{separator}, we mean here a token whose sole purpose is to
> +separate various elements within some enclosing syntactic construct, and
> +which does not have any semantic significance in itself (i.e. it would
> +typically no exist as a node in an abstract syntax tree).
             ^^
             not
[...]


  Štěpán



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-11-28 20:36 SMIE documentation Stefan Monnier
  2010-11-28 21:56 ` Štěpán Němec
@ 2010-11-29 17:54 ` Chong Yidong
  2010-11-29 21:34   ` Stefan Monnier
  2010-12-07 17:54   ` Stefan Monnier
  2010-12-01  0:39 ` Johan Bockgård
  2010-12-01 19:23 ` Johan Bockgård
  3 siblings, 2 replies; 18+ messages in thread
From: Chong Yidong @ 2010-11-29 17:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

A few quick comments:

> +programming languages's syntax is designed to be parsed forward, but for

This should be "languages'", or "the syntax of programming languages".

> +extra effort; but it also means that most programming languages cannot
> +be parsed correctly, at least not without resorting to some
> +special tricks.

You should xref to "SMIE Tricks", if that is what you are referring to.
In general, it is good to add xrefs to the introduction, even if there
are links to the same nodes in the menu below that introduction, because
that makes it clearer to the reader exactly where to find the more
detailed treatment of each statement in the introduction.

Also, it would be nice to have a better description of what kinds of
languages it's practical to use SMIE for.

> +@node SMIE Grammar
> +@subsubsection Defining the Grammar of a Language
> +
> +The usual way to define the SMIE grammar of a language is by
> +defining a new global variable holding the precedence table by
> +giving a set of BNF rules.
> +For example:

It might be good to give a sample of the kind of language that this
example is supposed to indent.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-11-29 17:54 ` Chong Yidong
@ 2010-11-29 21:34   ` Stefan Monnier
  2010-12-07 17:54   ` Stefan Monnier
  1 sibling, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2010-11-29 21:34 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

>> +programming languages's syntax is designed to be parsed forward, but for

> This should be "languages'", or "the syntax of programming languages".

>> +extra effort; but it also means that most programming languages cannot
>> +be parsed correctly, at least not without resorting to some
>> +special tricks.

> You should xref to "SMIE Tricks", if that is what you are referring to.
> In general, it is good to add xrefs to the introduction, even if there
> are links to the same nodes in the menu below that introduction, because
> that makes it clearer to the reader exactly where to find the more
> detailed treatment of each statement in the introduction.

Indeed.  Thanks all for the great feedback.

> Also, it would be nice to have a better description of what kinds of
> languages it's practical to use SMIE for.

I truly do not know.  It grew out of my sml-mode indentation algorithm,
and I've used it so far to indent: Octave, SML, Coq, Prolog, Octave, and
Modula-2.  I think it's able to handle any language, thanks to the trick
of moving some of the work to the lexer.  But indeed, that might become
impractical for some languages, tho I do not know enough to
characterize this.


        Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-11-28 20:36 SMIE documentation Stefan Monnier
  2010-11-28 21:56 ` Štěpán Němec
  2010-11-29 17:54 ` Chong Yidong
@ 2010-12-01  0:39 ` Johan Bockgård
  2010-12-07 19:27   ` Stefan Monnier
  2010-12-01 19:23 ` Johan Bockgård
  3 siblings, 1 reply; 18+ messages in thread
From: Johan Bockgård @ 2010-12-01  0:39 UTC (permalink / raw)
  To: emacs-devel


Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> While Savannah is down, maybe someone will feel like checking my attempt
> at documenting SMIE.

I have a few comments on the documentation and the functionality itself.

First, here's a patch to make smie-next-sexp correctly return (t POS
TOKEN) when bumping into a closing thingy in the forward direction.

--- a/lisp/emacs-lisp/smie.el
+++ b/lisp/emacs-lisp/smie.el
@@ -653,7 +653,8 @@ Possible return values:
                 (if (and halfsexp (numberp (funcall op-forw toklevels)))
                     (push toklevels levels)
                   (throw 'return
-                         (prog1 (list (or (car toklevels) t) (point) token)
+                         (prog1 (list (or (funcall op-forw toklevels) t)
+                                      (point) token)
                            (goto-char pos)))))
                (t
                 (let ((lastlevels levels))


There's a problem with the indentation when there's comment at the start
of a line before a keyword, like

    (* Comment *) IF ...                      <-- Press TAB

The virtual indent of the following keyword is computed as its current
column. The comment indentation function tries to align the comment with
this column, making the line wander farther and farther to the right for
every press of the TAB key.

> +Calling this function is sufficient to make commands such as
> +@code{forward-sexp}, @code{backward-sexp}, and @code{transpose-sexps}
> +be able to properly handle structural elements other than just the paired
> +parentheses already handled by syntax tables.  E.g. if the provided
> +grammar is precise enough, @code{transpose-sexps} can correctly
> +transpose the two arguments of a @code{+} operator, taking into account
> +the precedence rules of the language.

This makes C-M-f and friends behave very differently from most other
major modes, which doesn't really feel right.

> +To describe the lexing rules of your language to SMIE, you will need
> +2 functions, one to fetch the next token, and another to fetch the
> +previous token.  Those functions will usually first skip whitespace and
> +comments and then look at the next chunk of text to see if it
> +is a special token, if so it should skip it and return a description of
> +this token.  Usually this is simply the string extracted from the
> +buffer, but this is not necessarily the case.

It would be good if users could hook their own functions into all places
that extract text from the buffer ("buffer-substring"), and not just
smie-forward/backward-token-function; e.g. to use interned token strings
or to handle some kind of banana brackets using the syntax table.

> +@code{:elem}, in which case the function should return either the offset
> +to use to indent function arguments (if @var{arg} is the symbol
> +@code{arg})

Either there's a bug in the code, or this should be the symbol "args"
and likewise for the doc string.

> +@defun smie-rule-parent &optional offset
> +Return the proper offset to align with the parent.
> +If non-@code{nil}, @var{offset} should be an integer giving an
> +additional offset to apply.
> +@end defun

The function returns an absolute column, not an offset.

> +    (:before
> +     (cond
> +      ((equal token ",") (smie-rule-separator kind))
> +      ((member token '("begin" "(" "@{"))
> +       (if (smie-rule-hanging-p) (smie-rule-parent)))

Does this really work for "("? Most of the time smie-indent--parent
seems to returns nil before a paren, which breaks smie-rule-parent.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-11-28 20:36 SMIE documentation Stefan Monnier
                   ` (2 preceding siblings ...)
  2010-12-01  0:39 ` Johan Bockgård
@ 2010-12-01 19:23 ` Johan Bockgård
  2010-12-04  4:41   ` Stefan Monnier
  3 siblings, 1 reply; 18+ messages in thread
From: Johan Bockgård @ 2010-12-01 19:23 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> +@example
> +(defun sample-smie-rules (kind token)
> +  (case kind

`case' is not (yet) a standard part of Emacs Lisp.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-12-01 19:23 ` Johan Bockgård
@ 2010-12-04  4:41   ` Stefan Monnier
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2010-12-04  4:41 UTC (permalink / raw)
  To: emacs-devel

>> +@example
>> +(defun sample-smie-rules (kind token)
>> +  (case kind

> `case' is not (yet) a standard part of Emacs Lisp.

Good point, thank you, I've added a (eval-when-compile (require 'cl)).
Of course, in Emacs-24, you should use pcase (and so should this texi file).


        Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-11-28 21:56 ` Štěpán Němec
@ 2010-12-04 18:01   ` Stefan Monnier
  2010-12-04 19:46     ` Drew Adams
  2010-12-04 19:56     ` Štěpán Němec
  0 siblings, 2 replies; 18+ messages in thread
From: Stefan Monnier @ 2010-12-04 18:01 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: emacs-devel

> Thank you. A few nits I noticed:

Thanks, I integrated your fixes, except below:

>> +@c @defvar smie-grammar
>> +@c This variable is an alist specifying the left and right precedence of
>> +@c each token.  It is meant to be initialized with the use of one of the
>> +@c functions below.
>> +@c @end defvar

> Why is this commented out?

Because I'd rather not document this yet.  It doesn't have a "smie--"
prefix so it's not marked as internal, but it's not really external
either (it's set via smie-setup).  I may want to make it internal and/or
to change its representation.

>> +returns nil or an empty string, SMIE will try to handle the corresponding
>> +text as an sexp according to syntax tables.
>            ^^
>            a

Hmm... In my several years in the states, I spent a fair bit of time
hacking on Emacs, but I didn't talk much about sexps (I talked about
types instead), so I'm not sure how people pronounce them usually, but
I pronounce them "ess-exps", which is why I put an "an" rather than an
"a".  So I'd first need confirmation that it's indeed pronouced as
something like "sexp".

> What about "nearest enclosing" instead of "most closely enclosing"?

Thanks, that's what I was looking for,


        Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: SMIE documentation
  2010-12-04 18:01   ` Stefan Monnier
@ 2010-12-04 19:46     ` Drew Adams
  2010-12-04 20:21       ` Andreas Schwab
  2010-12-04 21:04       ` Bob Rogers
  2010-12-04 19:56     ` Štěpán Němec
  1 sibling, 2 replies; 18+ messages in thread
From: Drew Adams @ 2010-12-04 19:46 UTC (permalink / raw)
  To: 'Stefan Monnier', 'Štepán Nemec'; +Cc: emacs-devel

> >> +text as an sexp according to syntax tables.
> >            ^^
> >            a
> 
> Hmm... In my several years in the states,

You mean Montreal is not in the States? ;-)

> I spent a fair bit of time hacking on Emacs, but I didn't talk
> much about sexps (I talked about types instead), so I'm not
> sure how people pronounce them usually, but I pronounce them
> "ess-exps", which is why I put an "an" rather than an
> "a".  So I'd first need confirmation that it's indeed pronouced as
> something like "sexp".

You might find usage variable.[*]  Personally, I wouldn't dream of saying "ess-exp" (I can hardly pronounce it) - I always say "sexp".  But then I don't say "ess-ex" either, unless referring to the county in Britland. ;-)

"A sexp" is used in the Emacs and Elisp manuals. "An sexp" appears nowhere in the manuals. And "a sexp" is generally used for Lisp, AFAIK.

Googling "+sexp pronunciation lisp" is no help, BTW. Likewise the Wikipedia entry for S-expression. And oddly s-expression, sexp, and symbolic expression are absent from the Common Lisp HyperSpec's index and glossary (is there a searchable version?). And I cannot find anything about this in CLTL2 (is there a searchable version that works?).

---

[* The pronunciation of `SQL' varies, for instance -

It is typically pronounced "sequel" in Oracleland - that is reflected in the Oracle doc.

But as Wikipedia says: `Officially pronounced /ˌɛskjuːˈɛl/ like "S-Q-L" but often pronounced /ˈsiːkwəl/ like "sequel"'. 

One post at http://www.orafaq.com/forum/t/155444/0/: `A lot of non-native English speakers pronounce it as S.Q.L in their mother-tongue and then "translate" it to Es Que El in English, whereas most native English speakers I know pronounce it as sequel.'

About.com: `By the way, the correct pronunciation of SQL is a contentious issue within the database community. In their SQL standard, the American National Standards Institute declared that the official pronunciation is "es queue el." However, many database professionals have taken to the slang pronunciation "sequel." The choice is yours.' 

One post at http://stackoverflow.com/questions/23886/sql-pronunciation: `A long time ago IBM had a database with "QUEL" (QUEry Language). It was followed up with "SEQUEL" (a joke, since it was a sequel to the first language). The pronunciation followed through to "SQL", which is officially "ess-que-ell". So both are considered correct by most people.']




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-12-04 18:01   ` Stefan Monnier
  2010-12-04 19:46     ` Drew Adams
@ 2010-12-04 19:56     ` Štěpán Němec
  2010-12-06 18:26       ` Stefan Monnier
  1 sibling, 1 reply; 18+ messages in thread
From: Štěpán Němec @ 2010-12-04 19:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>> Thank you. A few nits I noticed:
>
> Thanks, I integrated your fixes, except below:
>
>>> +@c @defvar smie-grammar
>>> +@c This variable is an alist specifying the left and right precedence of
>>> +@c each token.  It is meant to be initialized with the use of one of the
>>> +@c functions below.
>>> +@c @end defvar
>
>> Why is this commented out?
>
> Because I'd rather not document this yet.  It doesn't have a "smie--"
> prefix so it's not marked as internal, but it's not really external
> either (it's set via smie-setup).  I may want to make it internal and/or
> to change its representation.

OK. Maybe adding a short comment to that effect would be helpful, to
prevent more people like me wondering about the reason?

>>> +returns nil or an empty string, SMIE will try to handle the corresponding
>>> +text as an sexp according to syntax tables.
>>            ^^
>>            a
>
> Hmm... In my several years in the states, I spent a fair bit of time
> hacking on Emacs, but I didn't talk much about sexps (I talked about
> types instead), so I'm not sure how people pronounce them usually, but
> I pronounce them "ess-exps", which is why I put an "an" rather than an
> "a".  So I'd first need confirmation that it's indeed pronouced as
> something like "sexp".

Right, I suspected as much. I pronounce "sexp" as written, but I'm not a
native speaker, so I'd be curious about the prevailing usage, too. In
any case, grepping the Emacs repository only reveals three occurences of
"an sexp", all others (including all Texinfo docs) being "a sexp".

  Štěpán



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-12-04 19:46     ` Drew Adams
@ 2010-12-04 20:21       ` Andreas Schwab
  2010-12-04 21:36         ` Drew Adams
  2010-12-04 21:04       ` Bob Rogers
  1 sibling, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2010-12-04 20:21 UTC (permalink / raw)
  To: Drew Adams; +Cc: án Nemec', 'Stefan Monnier', emacs-devel

http://en.wikipedia.org/wiki/S-expression

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: SMIE documentation
  2010-12-04 19:46     ` Drew Adams
  2010-12-04 20:21       ` Andreas Schwab
@ 2010-12-04 21:04       ` Bob Rogers
  1 sibling, 0 replies; 18+ messages in thread
From: Bob Rogers @ 2010-12-04 21:04 UTC (permalink / raw)
  To: Drew Adams
  Cc: 'Štepán Nemec', 'Stefan Monnier',
	emacs-devel

   From: "Drew Adams" <drew.adams@oracle.com>
   Date: Sat, 4 Dec 2010 11:46:11 -0800

   You might find usage variable.[*] Personally, I wouldn't dream of
   saying "ess-exp" (I can hardly pronounce it) - I always say "sexp".
   But then I don't say "ess-ex" either, unless referring to the county
   in Britland. ;-) . . .

Likewise, I've always said "SEX pee" (for some 30 years now).  I've
sometimes seen it spelled "s-exp", perhaps by writers objecting to the
off-color flavor, but I can't recall anyone ever pronouncing it that way
(with the possible exception of Lisp newbies).

   Googling "+sexp pronunciation lisp" is no help, BTW. Likewise the
   Wikipedia entry for S-expression. And oddly s-expression, sexp, and
   symbolic expression are absent from the Common Lisp HyperSpec's index
   and glossary (is there a searchable version?).

CLHS appears to prefer more specific terms like "expression", "list",
etc.  (Really, when you are talking about Lisp, prefixing "symbolic" is
kinda redundant, right?  ;-)

   And I cannot find anything about this in CLTL2 (is there a searchable
   version that works?).

In my paper copy, I find ~S described as "the S-expression format
directive", but no other use of the term.  (It appears in the index,
right above "sex".  ;-)

					-- Bob Rogers
					   http://www.rgrjr.com/



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: SMIE documentation
  2010-12-04 20:21       ` Andreas Schwab
@ 2010-12-04 21:36         ` Drew Adams
  0 siblings, 0 replies; 18+ messages in thread
From: Drew Adams @ 2010-12-04 21:36 UTC (permalink / raw)
  To: 'Andreas Schwab'
  Cc: 'Štepán Nemec', 'Stefan Monnier',
	emacs-devel

> http://en.wikipedia.org/wiki/S-expression

As I said:

> > Googling "+sexp pronunciation lisp" is no help, BTW.
> > Likewise the Wikipedia entry for S-expression.

I see nothing at that URL (or in any of the cross-ref'd references there) that
helps answer the pronunciation question.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-12-04 19:56     ` Štěpán Němec
@ 2010-12-06 18:26       ` Stefan Monnier
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2010-12-06 18:26 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: emacs-devel

>> Because I'd rather not document this yet.  It doesn't have a "smie--"
>> prefix so it's not marked as internal, but it's not really external
>> either (it's set via smie-setup).  I may want to make it internal and/or
>> to change its representation.
> OK. Maybe adding a short comment to that effect would be helpful, to
> prevent more people like me wondering about the reason?

Yes, done, thank you.

>>>> +returns nil or an empty string, SMIE will try to handle the corresponding
>>>> +text as an sexp according to syntax tables.
>>> ^^
>>> a
>> 
>> Hmm... In my several years in the states, I spent a fair bit of time
>> hacking on Emacs, but I didn't talk much about sexps (I talked about
>> types instead), so I'm not sure how people pronounce them usually, but
>> I pronounce them "ess-exps", which is why I put an "an" rather than an
>> "a".  So I'd first need confirmation that it's indeed pronouced as
>> something like "sexp".

> Right, I suspected as much. I pronounce "sexp" as written, but I'm not a
> native speaker, so I'd be curious about the prevailing usage, too.  In
> any case, grepping the Emacs repository only reveals three occurences of
> "an sexp", all others (including all Texinfo docs) being "a sexp".

Indeed "a sexp" seems to be the winner.  Not I'll be able to adapt my
pronunciation of it now that I'm used to it (tho it's only pronounced
in my head anyway, not like it matters).


        Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-11-29 17:54 ` Chong Yidong
  2010-11-29 21:34   ` Stefan Monnier
@ 2010-12-07 17:54   ` Stefan Monnier
  1 sibling, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2010-12-07 17:54 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

>> +programming languages's syntax is designed to be parsed forward, but for
> This should be "languages'",

I'm pretty sure "languages's" is OK as well.  Last I checked, it depends
on your religion ;-)

> or "the syntax of programming languages".

Or better yet, just "programming languages".  Thanks for bringing it to
my attention.


        Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-12-01  0:39 ` Johan Bockgård
@ 2010-12-07 19:27   ` Stefan Monnier
  2012-05-13 11:51     ` Johan Bockgård
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2010-12-07 19:27 UTC (permalink / raw)
  To: emacs-devel

> First, here's a patch to make smie-next-sexp correctly return (t POS
> TOKEN) when bumping into a closing thingy in the forward direction.

> --- a/lisp/emacs-lisp/smie.el
> +++ b/lisp/emacs-lisp/smie.el
> @@ -653,7 +653,8 @@ Possible return values:
>                  (if (and halfsexp (numberp (funcall op-forw toklevels)))
>                      (push toklevels levels)
>                    (throw 'return
> -                         (prog1 (list (or (car toklevels) t) (point) token)
> +                         (prog1 (list (or (funcall op-forw toklevels) t)
> +                                      (point) token)
>                             (goto-char pos)))))
>                 (t
>                  (let ((lastlevels levels))

I do not understand what case this intends to solve.  When I try to
"bump into a closing thingy in the forward direction" with the current
code, it seems to work correctly.

> There's a problem with the indentation when there's comment at the start
> of a line before a keyword, like

>     (* Comment *) IF ...                      <-- Press TAB

> The virtual indent of the following keyword is computed as its current
> column. The comment indentation function tries to align the comment with
> this column, making the line wander farther and farther to the right for
> every press of the TAB key.

Good catch, thanks.  Don't have a fix yet, sadly.

>> +Calling this function is sufficient to make commands such as
>> +@code{forward-sexp}, @code{backward-sexp}, and @code{transpose-sexps}
>> +be able to properly handle structural elements other than just the paired
>> +parentheses already handled by syntax tables.  E.g. if the provided
>> +grammar is precise enough, @code{transpose-sexps} can correctly
>> +transpose the two arguments of a @code{+} operator, taking into account
>> +the precedence rules of the language.

> This makes C-M-f and friends behave very differently from most other
> major modes, which doesn't really feel right.

I think it doesn't feel right because you're not yet used to it.
What it does is make C-M-[fbt] behave similar to how they behave in
Lisp languages.  It's very intuitive when skipping over begin...end.
It's admittedly less obviously intuitive for infix/mixfix constructs.

>> +To describe the lexing rules of your language to SMIE, you will need
>> +2 functions, one to fetch the next token, and another to fetch the
>> +previous token.  Those functions will usually first skip whitespace and
>> +comments and then look at the next chunk of text to see if it
>> +is a special token, if so it should skip it and return a description of
>> +this token.  Usually this is simply the string extracted from the
>> +buffer, but this is not necessarily the case.
> It would be good if users could hook their own functions into all places
> that extract text from the buffer ("buffer-substring"), and not just
> smie-forward/backward-token-function; e.g. to use interned token strings
> or to handle some kind of banana brackets using the syntax table.

I do not understand what you're referring to.

>> +@code{:elem}, in which case the function should return either the offset
>> +to use to indent function arguments (if @var{arg} is the symbol
>> +@code{arg})
> Either there's a bug in the code, or this should be the symbol "args"
> and likewise for the doc string.

I meant @code{args}, indeed.

>> +@defun smie-rule-parent &optional offset
>> +Return the proper offset to align with the parent.
>> +If non-@code{nil}, @var{offset} should be an integer giving an
>> +additional offset to apply.
>> +@end defun
> The function returns an absolute column, not an offset.

It does not return an absolute column (which would be a number) but
a cons cell (column . N), i.e. it returns an "offset" in the sense of:

   @var{offset} can be:
   @itemize
   @item
   @code{nil}: use the default indentation rule.
   @item
   @code{(column . @var{column})}: indent to column @var{column}.
   @item
   @var{number}: offset by @var{number}, relative to a base token which is
   the current token for @code{:after} and its parent for @code{:before}.
   @end itemize

>> +    (:before
>> +     (cond
>> +      ((equal token ",") (smie-rule-separator kind))
>> +      ((member token '("begin" "(" "@{"))
>> +       (if (smie-rule-hanging-p) (smie-rule-parent)))

> Does this really work for "("?

It works in those cases where I've used it, yes.

> Most of the time smie-indent--parent seems to returns nil before
> a paren, which breaks smie-rule-parent.

That would be a bug, indeed.  Please post a test case to reproduce it.


        Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2010-12-07 19:27   ` Stefan Monnier
@ 2012-05-13 11:51     ` Johan Bockgård
  2012-05-15 13:25       ` Stefan Monnier
  0 siblings, 1 reply; 18+ messages in thread
From: Johan Bockgård @ 2012-05-13 11:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> First, here's a patch to make smie-next-sexp correctly return (t POS
>> TOKEN) when bumping into a closing thingy in the forward direction.
>
>> --- a/lisp/emacs-lisp/smie.el
>> +++ b/lisp/emacs-lisp/smie.el
>> @@ -653,7 +653,8 @@ Possible return values:
>>                  (if (and halfsexp (numberp (funcall op-forw toklevels)))
>>                      (push toklevels levels)
>>                    (throw 'return
>> -                         (prog1 (list (or (car toklevels) t) (point) token)
>> +                         (prog1 (list (or (funcall op-forw toklevels) t)
>> +                                      (point) token)
>>                             (goto-char pos)))))
>>                 (t
>>                  (let ((lastlevels levels))
>
> I do not understand what case this intends to solve.  When I try to
> "bump into a closing thingy in the forward direction" with the current
> code, it seems to work correctly.

1.
In test/indent/modula2.mod,

   (smie-backward-sexp) after IF returns ((172) ...) = "bumped into
   open-thingy",

but

  (smie-forward-sexp) before END returns (0 ...) = "couldn't skip token
  because its level is too high", which is not correct.


2.
Return values of the form ((NUMBER) ...) are not documented.


3.
Doc string of smie-forward-sexp:
"open-paren or the beginning of buffer" should be "close-paren or the end of buffer".


Sorry for the rather late reply.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SMIE documentation
  2012-05-13 11:51     ` Johan Bockgård
@ 2012-05-15 13:25       ` Stefan Monnier
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2012-05-15 13:25 UTC (permalink / raw)
  To: emacs-devel

> 1.
> In test/indent/modula2.mod,
>    (smie-backward-sexp) after IF returns ((172) ...) = "bumped into
>    open-thingy",
> but
>   (smie-forward-sexp) before END returns (0 ...) = "couldn't skip token
>   because its level is too high", which is not correct.

Duh, yes, of course.  And the other (car toklevels) is smimilarly wrong,
of course.  Patch doubled and installed, thank you.

> 2.
> Return values of the form ((NUMBER) ...) are not documented.
> 3.
> Doc string of smie-forward-sexp:
> "open-paren or the beginning of buffer" should be "close-paren or the end of buffer".

Thanks, I fixed those.

> There's a problem with the indentation when there's comment at the start
> of a line before a keyword, like
>     (* Comment *) IF ...                      <-- Press TAB

Also fixed (tho there might still be more cases of "wandering indentation").

> Sorry for the rather late reply.

No problem, I've been caught several times replying after a delay of
4 years.


        Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-05-15 13:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-28 20:36 SMIE documentation Stefan Monnier
2010-11-28 21:56 ` Štěpán Němec
2010-12-04 18:01   ` Stefan Monnier
2010-12-04 19:46     ` Drew Adams
2010-12-04 20:21       ` Andreas Schwab
2010-12-04 21:36         ` Drew Adams
2010-12-04 21:04       ` Bob Rogers
2010-12-04 19:56     ` Štěpán Němec
2010-12-06 18:26       ` Stefan Monnier
2010-11-29 17:54 ` Chong Yidong
2010-11-29 21:34   ` Stefan Monnier
2010-12-07 17:54   ` Stefan Monnier
2010-12-01  0:39 ` Johan Bockgård
2010-12-07 19:27   ` Stefan Monnier
2012-05-13 11:51     ` Johan Bockgård
2012-05-15 13:25       ` Stefan Monnier
2010-12-01 19:23 ` Johan Bockgård
2010-12-04  4:41   ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).