From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
Newsgroups: gmane.emacs.devel
Subject: SMIE documentation
Date: Sun, 28 Nov 2010 15:36:26 -0500
Message-ID: <jwvwrnxur2f.fsf-monnier+emacs@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: dough.gmane.org 1290976696 28556 80.91.229.12 (28 Nov 2010 20:38:16 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Sun, 28 Nov 2010 20:38:16 +0000 (UTC)
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 28 21:38:10 2010
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1PMnzE-0001vj-FZ
	for ged-emacs-devel@m.gmane.org; Sun, 28 Nov 2010 21:38:10 +0100
Original-Received: from localhost ([127.0.0.1]:48949 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1PMnzD-0004av-QJ
	for ged-emacs-devel@m.gmane.org; Sun, 28 Nov 2010 15:36:39 -0500
Original-Received: from [140.186.70.92] (port=52450 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PMnz7-0004a6-LU
	for emacs-devel@gnu.org; Sun, 28 Nov 2010 15:36:36 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <monnier@IRO.UMontreal.CA>) id 1PMnz4-0001Xd-JI
	for emacs-devel@gnu.org; Sun, 28 Nov 2010 15:36:33 -0500
Original-Received: from pruche.dit.umontreal.ca ([132.204.246.22]:58171)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <monnier@IRO.UMontreal.CA>) id 1PMnz4-0001XY-9l
	for emacs-devel@gnu.org; Sun, 28 Nov 2010 15:36:30 -0500
Original-Received: from pastel.home (lechon.iro.umontreal.ca [132.204.27.242])
	by pruche.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id oASKaRDv031072;
	Sun, 28 Nov 2010 15:36:27 -0500
Original-Received: by pastel.home (Postfix, from userid 20848)
	id AEB6EA85F1; Sun, 28 Nov 2010 15:36:26 -0500 (EST)
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)
X-NAI-Spam-Level: 
X-NAI-Spam-Score: 0.5
X-NAI-Spam-Rules: 2 Rules triggered
	MAILTO_ONLY_JOB_NO_HTTP=0.5, RV3693=0
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:133212
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/133212>

While Savannah is down, maybe someone will feel like checking my attempt
at documenting SMIE.  See patch below.  I intend to install it on the
emacs-23 branch, in case it matters.


        Stefan


=== modified file 'doc/lispref/modes.texi'
--- doc/lispref/modes.texi	2010-08-22 19:30:26 +0000
+++ doc/lispref/modes.texi	2010-11-28 20:30:57 +0000
@@ -27,6 +27,7 @@
 * Imenu::              How a mode can provide a menu
                          of definitions in the buffer.
 * Font Lock Mode::     How modes can highlight text according to syntax.
+* Auto-Indentation::            How to teach Emacs to indent for a major mode.
 * Desktop Save Mode::  How modes can have buffer state saved between
                          Emacs sessions.
 @end menu
@@ -333,7 +333,7 @@
 programming language, indentation of text according to structure is
 probably useful.  So the mode should set @code{indent-line-function}
 to a suitable function, and probably customize other variables
-for indentation.
+for indentation.  @xref{Auto-Indentation}.
 
 @item
 @cindex keymaps in modes
@@ -3223,6 +3215,651 @@
 reasonably fast.
 @end defvar
 
+@node Auto-Indentation
+@section Auto-indention of code
+
+For programming languages, an important feature of a major mode is to
+provide automatic indentation.  This is controlled in Emacs by
+@code{indent-line-function} (@pxref{Mode-Specific Indent}).
+Writing a good indentation function can be difficult and to a large
+extent it is still a black art.
+
+Many major mode authors will start by writing a simple indentation
+function that works for simple cases, for example by comparing with the
+indentation of the previous text line.  For most programming languages
+that are not really line-based, this tends to scale very poorly:
+improving such a function to let it handle more diverse situations tends
+to become more and more difficult, resulting in the end with a large,
+complex, unmaintainable indentation function which nobody dares to touch.
+
+A good indentation function will usually need to actually parse the
+text, according to the syntax of the language.  Luckily, it is not
+necessary to parse the text in as much detail as would be needed
+for a compiler, but on the other hand, the parser embedded in the
+indentation code will want to be somewhat friendly to syntactically
+incorrect code.
+
+Good maintainable indentation functions usually fall into 2 categories:
+either parsing forward from some ``safe'' starting point until the
+position of interest, or parsing backward from the position of interest.
+Neither of the two is a clearly better choice than the other: parsing
+backward is often more difficult than parsing forward because
+programming languages's syntax is designed to be parsed forward, but for
+the purpose of indentation it has the advantage of not needing to
+guess a ``safe'' starting point, and it generally enjoys the property
+that only a minimum of text will be analyzed to decide of the
+indentation of a line, so indentation will tend to be unaffected by
+syntax errors in some earlier unrelated piece of code.  Parsing forward
+on the other hand is usually easier and has the advantage of making it
+possible to reindent efficiently a whole region at a time,
+with a single parse.
+
+Rather than write your own indentation function from scratch, it is
+often preferable to try and reuse some existing ones or to rely
+on a generic indentation engine.  There are sadly few such
+engines.  The CC-mode indentation code (used for C, C++, Java, Awk
+and a few other such modes) has been made more generic over the years,
+so if your language seems somewhat similar to one of those languages,
+you might try to use that engine.  @c FIXME: documentation?
+Another one is SMIE which tries to take an approach in the spirit
+of Lisp sexps and adapt it to non-Lisp languages.
+
+@menu
+* SMIE::                        A simple minded indentation engine
+@end menu
+
+@node SMIE
+@subsection Simple Minded Indentation Engine
+
+SMIE is a package that provides a generic navigation and indentation
+engine.  Based on a very simple parser using an ``operator
+precedence grammar'', it lets major modes extend the sexp-based
+navigation to non-Lisp languages as well as provide a simple to use but
+reliable auto-indentation.
+
+Operator precedence grammar is a very primitive technology for parsing
+compared to some of the more common techniques used in compilers.
+It has the following characteristics: algorithmically efficient, its
+parsing power is very limited, and it is largely unable to detect syntax
+errors, but it has the advantage of being able to parse forward just as
+well as backward.  In practice that means that we can use it for
+indentation based on backward parsing, that it can provide both
+@code{forward-sexp} and @code{backward-sexp} functionality, and also
+that it will naturally work on syntactically incorrect code without any
+extra effort; but it also means that most programming languages cannot
+be parsed correctly, at least not without resorting to some
+special tricks.
+
+@menu
+* SMIE setup::                  SMIE setup and features
+* Operator Precedence Grammars::  A very simple parsing technique
+* SMIE Grammar::                Defining the grammar of a language
+* SMIE Lexer::                  Defining what are tokens
+* SMIE Tricks::                 Working around the parser's limitations
+* SMIE Indentation::            Specifying indentation rules
+* SMIE Indentation Helpers::    Helper functions for indentation rules
+* SMIE Indentation Example::    Sample indentation rules
+@end menu
+
+@node SMIE setup
+@subsubsection SMIE Setup and Features
+
+SMIE is meant to be a one-stop shop for structural navigation and
+various other features which rely on the syntactic structure of code,
+mainly automatic indentation.  The main entry point is @code{smie-setup}
+which is a function typically called while setting up a major mode.
+
+@defun smie-setup grammar rules-function &rest keywords
+Setup SMIE navigation and indentation.
+@var{grammar} is a grammar table generated by @code{smie-prec2->grammar}.
+@var{rules-function} is a set of indentation rules for use on
+@code{smie-rules-function}.
+@var{keywords} are additional arguments, which can use the following keywords:
+@itemize
+@item
+@code{:forward-token} @var{fun}: Specify the forward lexer to use.
+@item
+@code{:backward-token} @var{fun}: Specify the backward lexer to use.
+@end itemize
+@end defun
+
+Calling this function is sufficient to make commands such as
+@code{forward-sexp}, @code{backward-sexp}, and @code{transpose-sexps}
+be able to properly handle structural elements other than just the paired
+parentheses already handled by syntax tables.  E.g. if the provided
+grammar is precise enough, @code{transpose-sexps} can correctly
+transpose the two arguments of a @code{+} operator, taking into account
+the precedence rules of the language.
+
+It is also sufficient to make TAB indentation work in the expected way,
+and provides some commands you can bind in the major mode keymap.
+Other packages could also use the data it sets up to provide
+structure-aware alignment or make @code{blink-matching-paren} work for
+things like @code{begin...end}.
+
+@deffn Command smie-close-block
+This command closes the most recently opened (and not yet closed) block.
+@end deffn
+
+@deffn Command smie-down-list &optional arg
+This command is like @code{down-list} but also pays attention to nesting
+of tokens other than parentheses, such as @code{begin...end}.
+@end deffn
+
+@node Operator Precedence Grammars
+@subsubsection Operator Precedence Grammars
+
+SMIE's precedence grammars simply give to each token a pair of
+precedences: the left-precedence and the right-precedence.  We say
+@code{T1 < T2} if the right-precedence of token @code{T1} is less than
+the left-precedence of token @code{T2}.  A good way to read this
+@code{<} is as a kind of parenthesis: if we find @code{... T1 something
+T2 ...}  then it should be parsed as @code{... T1 (something T2 ...}
+rather than as @code{... T1 something) T2 ...} which would be the case
+if we had @code{T1 > T2}.  If we have @code{T1 = T2}, it means that
+token T2 follows token T1 in the same syntactic construction, so
+typically we have @code{"begin" = "end"}.  Such pairs of precedences are
+sufficient to express left-associativity or right-associativity of infix
+operators, nesting of tokens like parentheses and many other cases.
+
+@c @defvar smie-grammar
+@c This variable is an alist specifying the left and right precedence of
+@c each token.  It is meant to be initialized with the use of one of the
+@c functions below.
+@c @end defvar
+
+@defun smie-prec2->grammar table
+This function takes a @emph{prec2} grammar @var{table} and returns an
+alist suitable for use in @code{smie-setup}.  The @emph{prec2}
+@var{table} is itself meant to be built by one of the functions below.
+@end defun
+
+@defun smie-merge-prec2s &rest tables
+This function takes several @emph{prec2} @var{tables} and merges them
+into a new @emph{prec2} table.
+@end defun
+
+@defun smie-precs->prec2 precs
+This function builds a @emph{prec2} table from a table of precedences
+@var{precs}.  @var{precs} should be a list, sorted by precedence (for
+example @code{"+"} will come before @code{"*"}), of elements of the form
+@code{(@var{assoc} @var{op} ...)}, where @var{op} are tokens that
+act as operators and @var{assoc} is their associativity which can be
+either @code{left}, @code{right}, @code{assoc}, or @code{nonassoc}.
+All operators in one of those elements share the same precedence level
+and associativity.
+@end defun
+
+@defun smie-bnf->prec2 bnf &rest resolvers
+This function lets you specify the grammar using a BNF notation.
+It takes a @var{bnf} description of the grammar along with a set of
+conflict resolution rules @var{resolvers} and
+returns a @emph{prec2} table.
+
+@var{bnf} is a list of nonterminal definitions of the form
+@code{(@var{nonterm} @var{rhs1} @var{rhs2} ...)} where each @var{rhs}
+is a (non-empty) list of terminals (aka tokens) or non-terminals.
+
+Not all grammars are accepted:
+@itemize
+@item
+An @var{rhs} cannot be an empty list (this is not needed, since SMIE
+allows all non-terminals to match the empty string anyway).
+@item
+An @var{rhs} cannot have 2 consecutive non-terminals: between each
+non-terminal needs to be a terminal (aka token).  This is a fundamental
+limitation of the parsing technology used (operator precedence grammar).
+@end itemize
+
+Additionally, conflicts can occur:
+@itemize
+@item
+The returned @emph{prec2} table holds constraints between pairs of token, and
+for any given pair only one constraint can be present, either: T1 < T2,
+T1 = T2, or T1 > T2.
+@item
+A token can either be an @code{opener} (something similar to an open-paren),
+a @code{closer} (like a close-paren), or @code{neither} of the two
+(e.g. an infix operator, or an inner token like @code{"else"}).
+@end itemize
+
+Precedence conflicts can be resolved via @var{resolvers}, which is a list of
+@emph{precs} tables (see @code{smie-precs->prec2}): for each precedence
+conflicts, if those @code{precs} tables specify a particular constraint,
+then this constraint is used instead of reporting a conflict.
+@end defun
+
+@node SMIE Grammar
+@subsubsection Defining the Grammar of a Language
+
+The usual way to define the SMIE grammar of a language is by
+defining a new global variable holding the precedence table by
+giving a set of BNF rules.
+For example:
+@example
+@group
+(require 'smie)
+(defvar sample-smie-grammar
+  (smie-prec2->grammar
+   (smie-bnf->prec2
+@end group
+@group
+    '((id)
+      (inst ("begin" insts "end")
+            ("if" exp "then" inst "else" inst)
+            (id ":=" exp)
+            (exp))
+      (insts (insts ";" insts) (inst))
+      (exp (exp "+" exp)
+           (exp "*" exp)
+           ("(" exps ")"))
+      (exps (exps "," exps) (exp)))
+@end group
+@group
+    '((assoc ";"))
+    '((assoc ","))
+    '((assoc "+") (assoc "*")))))
+@end group
+@end example
+
+@noindent
+A few things to note:
+
+@itemize
+@item
+The above grammar did not explicitly mention the syntax of function
+calls: SMIE will automatically allow any sequence of sexps, such as
+identifiers, balanced parentheses, or @code{begin ... end} blocks
+to appear anywhere anyway.
+@item
+The grammar category @code{id} has no right hand side: this does not
+mean that it can only match the empty string, since as mentioned any
+sequence of sexp can appear anywhere anyway.
+@item
+Because non terminals cannot appear consecutively in the BNF grammar, it
+is difficult to correctly handle tokens that act as terminators, so the
+above grammar treats @code{";"} as a statement @emph{separator} instead,
+which SMIE can handle very well.
+@item
+Separators used in sequences (such as @code{","} and @code{";"} above)
+are best defined with BNF rules such as @code{(foo (foo "separator" foo) ...)}
+which generate precedence conflicts which are then resolved by giving
+them an explicit @code{(assoc "separator")}.
+@item
+The @code{("(" exps ")")} rule was not needed to pair up parens, since
+SMIE will pair up any chars that are marked as having paren syntax in
+the syntax table.  What this rule does instead (together with the
+definition of @code{exps}) is to make it clear that @code{","} will not
+appear outside of parentheses.
+@item
+Rather than have a single @emph{precs} table to resolve conflicts, it is
+preferable to use several tables, so as to let the BNF part of the
+grammar specify relative precedences where possible.
+@item
+Unless there is a very good reason to prefer @code{left} or
+@code{right}, it is usually preferable to mark operators as associative
+with @code{assoc}.  For that reason @code{"+"} and @code{"*"} are
+defined above as @code{assoc}, although the language defines them
+formally as left associative.
+@end itemize
+
+@node SMIE Lexer
+@subsubsection Defining What is a Token
+
+SMIE comes with a predefined lexical analyzer which uses syntax tables
+in the following way: any sequence of chars that have word or symbol
+syntax is considered as a token, and so is any sequence of chars that
+have punctuation syntax.  This is often a good starting point but is
+rarely actually correct for any given language.  For example, it will
+consider @code{"2,+3"} as being composed of 3 tokens: @code{"2"},
+@code{",+"}, and @code{"3"}.
+
+To describe the lexing rules of your language to SMIE, you will need
+2 functions, one to fetch the next token, and another to fetch the
+previous token.  Those functions will usually first skip whitespace and
+comments and then look at the next chunk of text to see if it
+is a special token, if so it should skip it and return a description of
+this token.  Usually this is simply the string extracted from the
+buffer, but this is not necessarily the case.
+For example:
+@example
+@group
+(defvar sample-keywords-regexp
+  (regexp-opt '("+" "*" "," ";" ">" ">=" "<" "<=" ":=" "=")))
+@end group
+@group
+(defun sample-smie-forward-token ()
+  (forward-comment (point-max))
+  (cond
+   ((looking-at sample-keywords-regexp)
+    (goto-char (match-end 0))
+    (match-string-no-properties 0))
+   (t (buffer-substring-no-properties
+       (point)
+       (progn (skip-syntax-forward "w_")
+              (point))))))
+@end group
+@group
+(defun sample-smie-backward-token ()
+  (forward-comment (- (point)))
+  (cond
+   ((looking-back sample-keywords-regexp (- (point) 2) t)
+    (goto-char (match-beginning 0))
+    (match-string-no-properties 0))
+   (t (buffer-substring-no-properties
+       (point)
+       (progn (skip-syntax-backward "w_")
+              (point))))))
+@end group
+@end example
+
+Notice how those lexers return the empty string when in front of
+parentheses.  This is because SMIE will automatically take care of the
+parentheses defined in the syntax table.  More specifically if the lexer
+returns nil or an empty string, SMIE will try to handle the corresponding
+text as an sexp according to syntax tables.
+
+@node SMIE Tricks
+@subsubsection Living With a Weak Parser
+
+The parsing technique used by SMIE does not allow tokens to behave
+different in different contexts.  For most programming languages, this
+will manifest itself by precedence conflicts when converting the BNF
+grammar.
+
+Sometimes, those conflicts can be worked around by expressing
+the grammar slightly differently.  For example for Modula-2 it might
+seem natural to have a BNF grammar that looks like:
+
+@example
+  ...
+  (inst ("IF" exp "THEN" insts "ELSE" insts "END")
+        ("CASE" exp "OF" cases "END")
+        ...)
+  (cases (cases "|" cases) (caselabel ":" insts) ("ELSE" insts))
+  ...
+@end example
+
+But this will create conflicts for @code{"ELSE"} since the IF rule
+implies (among many other things) that @code{"ELSE" = "END"}, but on the
+other hand, since @code{"ELSE"} appears within @code{cases} which
+appears left of @code{"END"}, we also have @code{"ELSE" > "END"}.
+We can solve it either by using:
+@example
+  ...
+  (inst ("IF" exp "THEN" insts "ELSE" insts "END")
+        ("CASE" exp "OF" cases "END")
+        ("CASE" exp "OF" cases "ELSE" insts "END")
+        ...)
+  (cases (cases "|" cases) (caselabel ":" insts))
+  ...
+@end example
+or
+@example
+  ...
+  (inst ("IF" exp "THEN" else "END")
+        ("CASE" exp "OF" cases "END")
+        ...)
+  (else (insts "ELSE" insts))
+  (cases (cases "|" cases) (caselabel ":" insts) (else))
+  ...
+@end example
+
+Some other times, after careful consideration you may conclude that
+those conflicts are not serious and simply resolve them via the
+@var{resolvers} argument of @code{smie-bnf->prec2}.  This is typically
+the case for separators and associative infix operators where we
+add a resolver like @code{'((assoc "|"))}.  Another case where this can
+happen is for the classic @emph{dangling else} problem where we will use
+@code{'((assoc "else" "then"))}.  It can also happen for cases where the
+conflict is real and cannot really be resolved, but it is unlikely to
+pose problem in practice.
+
+Finally, in many cases some conflicts will remain despite all efforts to
+restructure the grammar.  Do not despair: while the parser cannot be
+made more clever, you can make the lexer as smart as you want.  So, the
+solution is then to look at the tokens involved in the conflict and to
+split one of those tokens into 2 (or more) different tokens.  E.g. if
+the grammar needs to distinguish between two incompatible uses of the
+token @code{"begin"}, make the lexer return different tokens (say
+@code{"begin-fun"} and @code{"begin-plain"}) depending on what kind of
+@code{"begin"} it finds.  This pushes the work of distinguishing the
+different cases to the lexer, which will thus have to look at the
+surrounding text to find ad-hoc clues.
+
+@node SMIE Indentation
+@subsubsection Specifying Indentation Rules
+
+Based on the provided grammar, SMIE will be able to provide automatic
+indentation without any extra effort.  But in practice, this default
+indentation style will probably not be good enough.  You will want to
+tweak it in many different cases.
+
+SMIE indentation is based on the idea that indentation rules should be
+as local as possible.  To this end, it relies on the idea of
+@emph{virtual} indentation, which is the indentation that a particular
+program point would have if it were at the beginning of a line.
+Of course, if that program point is indeed at the beginning of a line,
+the virtual indentation of that point is its current indentation.
+But if not, then SMIE will use the indentation algorithm to compute the
+virtual indentation of that point.  Now in practice, the virtual
+indentation of a program point does not have to be identical to the
+indentation it would have if we inserted a newline before it.  To see
+how this works, the SMIE rule for indentation
+after a @code{@{} in C will not care whether the @code{@{} is standing
+on a line of its own or is at the end of the preceding line.  Instead,
+these different cases will be handled in the indentation rule that
+decides how to indent before a @code{@{}.
+
+An other important concept is the notion of @emph{parent}: The
+@emph{parent} of a token, is the head token of the most closely
+enclosing syntactic construct.  For example, the parent of an
+@code{else} will be the @code{if} to which it belongs, and the parent of
+an @code{if}, in turn, will be the lead token of the surrounding
+construct.  The command @code{backward-sexp} will jump from a token to
+its parent, but with some caveats: for @emph{openers} (tokens which
+start a construct, like @code{if}) you need to start before, while for
+others you need to start with point after the token; and
+@code{backward-sexp} will stop with point before the parent token if
+that is the @emph{opener} of the token of interest and otherwise it will
+stop with point after the parent token.
+
+SMIE indentation rules are specified with a function that takes two
+arguments @var{method} and @var{arg} where the meaning of @var{arg} and the
+expected return value depends on @var{method}.
+
+@var{method} can be:
+@itemize
+@item
+@code{:after}, in which case @var{arg} is a token and the function
+should return the @var{offset} to use for indentation after @var{arg}.
+@item
+@code{:before}, in which case @var{arg} is a token and the function
+should return the @var{offset} to use to indent @var{arg} itself.
+@item
+@code{:elem}, in which case the function should return either the offset
+to use to indent function arguments (if @var{arg} is the symbol
+@code{arg}) or the basic indentation step (if @var{arg} is the symbol
+@code{basic}).
+@item
+@code{:list-intro}, in which case @var{arg} is a token and the function
+should return non-nil if the token is followed by a list of expressions
+(not separated by any token) rather than an expression.
+@end itemize
+
+When @var{arg} is a token, the function is called with point just before
+that token.  A return value of nil always means to fallback on the
+default behavior, so the function should return nil for arguments it
+does not expect.
+
+@var{offset} can be:
+@itemize
+@item
+@code{nil}: use the default indentation rule.
+@item
+@code{(column . @var{column})}: indent to column @var{column}.
+@item
+@var{number}: offset by @var{number}, relative to a base token which is
+the current token for @code{:after} and its parent for @code{:before}.
+@end itemize
+
+@node SMIE Indentation Helpers
+@subsubsection Helper Functions for Indentation Rules
+
+SMIE provides various functions designed specifically for use in the
+indentation rules function (several of those function will break if used
+in another context).  These functions all start with the prefix
+@code{smie-rule-}.
+
+@defun smie-rule-bolp
+Return non-@code{nil} if the current token is the first on the line.
+@end defun
+
+@defun smie-rule-hanging-p
+Return non-@code{nil} if the current token is @emph{hanging}.
+A token is @emph{hanging} if it is at the last token on the line
+and if it is preceded by other tokens: a lone token on a line is not
+considered as hanging.
+@end defun
+
+@defun smie-rule-next-p &rest tokens
+Return non-@code{nil} if the next token is among @var{tokens}.
+@end defun
+
+@defun smie-rule-prev-p &rest tokens
+Return non-@code{nil} if the previous token is among @var{tokens}.
+@end defun
+
+@defun smie-rule-parent-p &rest parents
+Return non-@code{nil} if the current token's parent is among @var{parents}.
+@end defun
+
+@defun smie-rule-sibling-p
+Return non-nil if the parent is actually a sibling.
+This is the case for example when the parent of a @code{","} is just the
+previous @code{","}.
+@end defun
+
+@defun smie-rule-parent &optional offset
+Return the proper offset to align with the parent.
+If non-@code{nil}, @var{offset} should be an integer giving an
+additional offset to apply.
+@end defun
+
+@defun smie-rule-separator method
+Indent current token as a @emph{separator}.
+
+By @emph{separator}, we mean here a token whose sole purpose is to
+separate various elements within some enclosing syntactic construct, and
+which does not have any semantic significance in itself (i.e. it would
+typically no exist as a node in an abstract syntax tree).
+
+Such a token is expected to have an associative syntax and be closely
+tied to its syntactic parent.  Typical examples are @code{","} in lists
+of arguments (enclosed inside parentheses), or @code{";"} in sequences
+of instructions (enclosed in a @code{@{...@}} or @code{begin...end}
+block).
+
+@var{method} should be the method name that was passed to
+`smie-rules-function'.
+@end defun
+
+@node SMIE Indentation Example
+@subsubsection Sample Indentation Rules
+
+Here is an example of an indentation function:
+
+@example
+(defun sample-smie-rules (kind token)
+  (case kind
+    (:elem (case token
+             (basic sample-indent-basic)))
+    (:after
+     (cond
+      ((equal token ",") (smie-rule-separator kind))
+      ((equal token ":=") sample-indent-basic)))
+    (:before
+     (cond
+      ((equal token ",") (smie-rule-separator kind))
+      ((member token '("begin" "(" "@{"))
+       (if (smie-rule-hanging-p) (smie-rule-parent)))
+      ((equal token "if")
+       (and (not (smie-rule-bolp)) (smie-rule-prev-p "else")
+            (smie-rule-parent)))))))
+@end example
+
+@noindent
+A few things to note:
+
+@itemize
+@item
+The first case indicates what is the basic indentation increment
+to use.  If @code{sample-indent-basic} is nil, then it defaults to the
+global setting @code{smie-indent-basic}.  The major mode could have set
+@code{smie-indent-basic} buffer-locally instead, but that is discouraged.
+
+@item
+The two (identical) rules for the token @code{","} make SMIE try to be
+more clever when the comma separator is placed at the beginning of
+lines; more specifically, it tries to outdent the separator so as to
+align the code after the comma; for example:
+
+@example
+x = longfunctionname (
+        arg1
+      , arg2
+    );
+@end example
+
+@item
+The rule for indentation after @code{":="} is there because otherwise
+SMIE would treat @code{":="} as an infix operator and would align the
+right argument with the left one.
+
+@item
+The rule for indentation before @code{"begin"} is an example of the use
+of virtual indentation: this rule is only used when @code{"begin"} is
+hanging, which can only happen when it is not at the beginning
+of a line, so clearly this is not used when indenting @code{"begin"}
+itself but only when indenting something relative to this
+@code{"begin"}.  Concretely, this rule changes the indentation from:
+
+@example
+    if x > 0 then begin
+            dosomething(x);
+        end
+@end example
+to
+@example
+    if x > 0 then begin
+        dosomething(x);
+    end
+@end example
+
+@item
+The rule for indentation before @code{"if"} is a similar example, where
+the purpose is to treat @code{"else if"} as a single unit, so as to
+align a sequence of tests rather than indent each test further to the
+right.  This function chose to only do this in the case where the
+@code{"if"} is not placed on a separate line, hence the
+@code{smie-rule-bolp} test.
+
+If we know that the @code{"else"} is always aligned with its @code{"if"}
+and is always at the beginning of a line, we can use a more efficient
+rule:
+@example
+((equal token "if")
+ (and (not (smie-rule-bolp)) (smie-rule-prev-p "else")
+      (save-excursion
+        (sample-smie-backward-token)  ;Jump before the "else".
+        (cons 'column (current-column)))))
+@end example
+
+The advantage of this formulation is that it will reuse the indentation
+of the previous @code{"else"}, rather than having to go all the way back
+to the first @code{"if"} of the sequence.
+@end itemize
+
 @node Desktop Save Mode
 @section Desktop Save Mode
 @cindex desktop save mode
@@ -3276,5 +3913,7 @@
 @end defvar
 
 @ignore
-   arch-tag: 4c7bff41-36e6-4da6-9e7f-9b9289e27c8e
+   Local Variables:
+   fill-column: 72
+   End:
 @end ignore

=== modified file 'doc/lispref/text.texi'
--- doc/lispref/text.texi	2010-11-21 18:07:47 +0000
+++ doc/lispref/text.texi	2010-11-25 20:54:41 +0000
@@ -2205,11 +2205,11 @@
 @defvar indent-line-function
 This variable's value is the function to be used by @key{TAB} (and
 various commands) to indent the current line.  The command
-@code{indent-according-to-mode} does no more than call this function.
+@code{indent-according-to-mode} does little more than call this function.
 
 In Lisp mode, the value is the symbol @code{lisp-indent-line}; in C
 mode, @code{c-indent-line}; in Fortran mode, @code{fortran-indent-line}.
-The default value is @code{indent-relative}.
+The default value is @code{indent-relative}.  @xref{Auto-Indentation}.
 @end defvar
 
 @deffn Command indent-according-to-mode