bug#36496: [PATCH] Describe the rx notation in the lisp manual

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

From: "Mattias Engdegård" <mattiase@acm.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 36496@debbugs.gnu.org
Subject: bug#36496: [PATCH] Describe the rx notation in the lisp manual
Date: Fri, 5 Jul 2019 16:13:52 +0200	[thread overview]
Message-ID: <AF152105-C741-45BD-85C6-1584DDE91800@acm.org> (raw)
In-Reply-To: <838stdbw8e.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 4773 bytes --]

4 juli 2019 kl. 18.28 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> This is a large section.  The ELisp reference is already a large book,
> printed in two separate volumes.  So I think if we want to include
> this section, it will have to be on a separate file that is
> conditionally included @ifnottex.
> 
> Alternatively, we could make this a separate manual.

It is about 7-8 pages in all. One page could be saved by combining the character class descriptions with the existing ones; they are basically the same. However, that would probably preclude separation into separate files or manuals.

The category names also take up about one page, but that information isn't available anywhere else, since those names are specific to rx. (It would be nice if the names were defined along with the categories, but that isn't the case at present.)

I would prefer @ifnottex to having a separate manual, since one of the points is to make rx feel like a part of elisp and a genuine, practical alternative to regexp strings rather than an add-on. For example, the "Complex Regexp Example" turned out to be a good place for an rx version.

The revised patch (attached) does not separate the contents, because I wanted to hear your opinion on the matter first.

>> The existing `rx' doc string can be left unchanged, or reduced to something more concise, perhaps without a description of the entire rx language but with a manual reference. Suggestions are welcome.
> 
> Yes, the doc string should be reduced to the summary of the
> constructs.

Good, let's do that when the changes to the manual are done.

>> +Bind the name @var{ref} to a submatch that matches @var{rx-expr}@enddots{}.
>   ^^^^^^^^^^^^^^^^^^^^^^^
> "Bind the symbol @var{ref}", no?

Yes, thank you.

>> +or, using shorter synonyms and written more compactly,
> 
> This last line needs @noindent before it.

Added, and in another place.

>> +@table @asis
>> +@item @code{"some-string"}
> 
> Why @code{"..."} and not @samp{...}?  The latter will look better both
> in print and in Info format.

I looked at the result in all formats (pdf, info, html) and came to the opposite conclusion; it makes it clear that it's about a string literal. It's not a strongly held opinion, however.

>> +Corresponding string regexp: @samp{AB@dots{}} (subexpressions in sequence).
>                                ^^^^^^^^^^^^^^^^
> I think this should use @samp{@var{a}@var{b}@dots{}} instead. And
> likewise for the other "corresponding string regexps".  The reason is
> that neither A nor B stand for themselves, literally, they are
> meta-variables.

Right; again I made experiments, and ended up with @samp{var{A}@var{B}@dots{}}. The upper-case variables looked much better in print and html.

>> +Match the @var{rx}s once or not at all.@*
> 
> "Match @var{rx} or an empty string" sounds better to me.

Much better, thank you. Changed in all places.

>> +Match the @var{rx}s zero or more times, non-greedily.@*
> 
> I would add here a cross-reference to where greedy matching is
> described.

Done, with a separate sub subheading for the non-greedy stuff.

>> +@item @code{(any @var{charset}@dots{})}
> 
> Please don't call this "charset", as that term is already taken by a
> very different creature in Emacs.  I suggest "character set" instead.

Yes, I ended up using "set" since it's shorter and even better in this case.

>> +Each @var{charset} is a character, a string representing the set of
>> +its characters, a range or a character class.  A range is either a
>> +hyphen-separated string like @code{"A-Z"}, or a cons of characters
>> +like @code{(?A . ?Z)}.
> 
> Again, a cross-reference to where "character class" described would be
> good here, as would a @cindex entry for "character class in rx".

Done; the cross-reference is just a "see below" since it's very near.

>> +@item @code{space}, @code{whitespace}, @code{white}
>> +Match any character that has whitespace syntax.
> 
> Only ASCII or also non-ASCII?  This should be spelled out.

It's a matter of the syntax table; I used the exact formulation of the existing char class description.

>> +@xref{Syntax Class Table} for details.  Please note that
>                            ^
> Comma missing there.

Ah, yes. Apparently, a comma is inserted automatically in the TeX version, so that we get the desired "See Section XIV, page 123, for details"; this is documented. In the info and html versions there is no page number, so a comma doesn't feel like proper English: "See Section XIV, for details" has a distinct German tone to my ears.
Explicit comma after @xref seems to be common in the Emacs manuals, so rather than to fight it out I castled the clauses.


[-- Attachment #2: 0001-Describe-the-rx-notation-in-the-elisp-manual-bug-364.patch --]
[-- Type: application/octet-stream, Size: 23535 bytes --]

From fde854686146a1642c958e2871c4b376b1fe09a1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Thu, 4 Jul 2019 13:01:52 +0200
Subject: [PATCH] Describe the rx notation in the elisp manual (bug#36496)

* doc/lispref/searching.texi (Regular Expressions): New menu entry.
(Regexp Example): Add rx form of the example.
(Rx Notation, Rx Constructs, Rx Functions): New nodes.
* doc/lispref/control.texi (pcase Macro): Describe the rx pattern.
---
 doc/lispref/control.texi   |  23 ++
 doc/lispref/searching.texi | 559 +++++++++++++++++++++++++++++++++++++
 2 files changed, 582 insertions(+)

diff --git a/doc/lispref/control.texi b/doc/lispref/control.texi
index e308d68b75..625964774d 100644
--- a/doc/lispref/control.texi
+++ b/doc/lispref/control.texi
@@ -618,6 +618,29 @@ pcase Macro
 to @var{body-forms} (thus avoiding an evaluation error on match),
 if any of the sub-patterns let-binds a set of symbols,
 they @emph{must} all bind the same set of symbols.
+
+@anchor{rx in pcase}
+@item (rx @var{rx-expr}@dots{})
+Matches strings against the regexp @var{rx-expr}@dots{}, using the
+@code{rx} regexp notation (@pxref{Rx Notation}), as if by
+@code{string-match}.
+
+In addition to the usual @code{rx} syntax, @var{rx-expr}@dots{} can
+contain the following constructs:
+
+@table @code
+@item (let @var{ref} @var{rx-expr}@dots{})
+Bind the symbol @var{ref} to a submatch that matches
+@var{rx-expr}@enddots{}.  @var{ref} is bound in @var{body-forms} to
+the string of the submatch or nil, but can also be used in
+@code{backref}.
+
+@item (backref @var{ref})
+Like the standard @code{backref} construct, but @var{ref} can here
+also be a name introduced by a previous @code{(let @var{ref} @dots{})}
+construct.
+@end table
+
 @end table
 
 @anchor{pcase-example-0}
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index ef1cffc446..40a9cb523b 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -254,6 +254,7 @@ Regular Expressions
 @menu
 * Syntax of Regexps::       Rules for writing regular expressions.
 * Regexp Example::          Illustrates regular expression syntax.
+* Rx Notation::             An alternative, structured regexp notation.
 * Regexp Functions::        Functions for operating on regular expressions.
 @end menu
 
@@ -359,6 +360,7 @@ Regexp Special
 preceding expression either once or not at all.  For example,
 @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
 
+@anchor{Non-greedy repetition}
 @item @samp{*?}, @samp{+?}, @samp{??}
 @cindex non-greedy repetition characters in regexp
 These are @dfn{non-greedy} variants of the operators @samp{*}, @samp{+}
@@ -951,6 +953,563 @@ Regexp Example
 beyond the minimum needed to end a sentence.
 @end table
 
+In the @code{rx} notation (@pxref{Rx Notation}), the regexp could be written
+
+@example
+@group
+(rx (any ".?!")                    ; Punctuation ending sentence.
+    (zero-or-more (any "\"')]@}"))  ; Closing quotes or brackets.
+    (or line-end
+        (seq " " line-end)
+        "\t"
+        "  ")                      ; Two spaces.
+    (zero-or-more (any "\t\n ")))  ; Optional extra whitespace.
+@end group
+@end example
+
+Since @code{rx} regexps are just S-expressions, they can be formatted
+and commented as such.
+
+@node Rx Notation
+@subsection The @code{rx} Structured Regexp Notation
+@cindex rx
+@cindex regexp syntax
+
+  As an alternative to the string-based syntax, Emacs provides the
+structured @code{rx} notation based on Lisp S-expressions.  This
+notation is usually easier to read, write and maintain than regexp
+strings, and can be indented and commented freely.  It requires a
+conversion into string form since that is what regexp functions
+expect, but that conversion typically takes place during
+byte-compilation rather than when the Lisp code using the regexp is
+run.
+
+  Here is an @code{rx} regexp@footnote{It could be written much
+simpler with non-greedy operators (how?), but that would make the
+example less interesting.} that matches a block comment in the C
+programming language:
+
+@example
+@group
+(rx "/*"                          ; Initial /*
+    (zero-or-more
+     (or (not (any "*"))          ;  Either non-*,
+         (seq "*"                 ;  or * followed by
+              (not (any "/")))))  ;  non-/
+    (one-or-more "*")             ; At least one star,
+    "/")                          ; and the final /
+@end group
+@end example
+
+@noindent
+or, using shorter synonyms and written more compactly,
+
+@example
+@group
+(rx "/*"
+    (* (| (not (any "*"))
+          (: "*" (not (any "/")))))
+    (+ "*") "/")
+@end group
+@end example
+
+@noindent
+In conventional string syntax, it would be written
+
+@example
+"/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
+@end example
+
+The @code{rx} notation is mainly useful in Lisp code; it cannot be
+used in most interactive situations where a regexp is requested, such
+as when running @code{query-replace-regexp} or in variable
+customisation.
+
+@menu
+* Rx Constructs::       Constructs valid in rx forms.
+* Rx Functions::        Functions and macros that use rx forms.
+@end menu
+
+@node Rx Constructs
+@subsubsection Constructs in @code{rx} regexps
+
+The various forms in @code{rx} regexps are described below.  The
+shorthand @var{rx} represents any @code{rx} form, and @var{rx}@dots{}
+means one or more @code{rx} forms.  Where the corresponding string
+regexp syntax is given, @var{A}, @var{B}, @dots{} are string regexp
+subexpressions.
+@c With the new implementation of rx, this can be changed from
+@c 'one or more' to 'zero or more'.
+
+@subsubheading Literals
+
+@table @asis
+@item @code{"some-string"}
+Match the string @samp{some-string} literally.  There are no
+characters with special meaning, unlike in string regexps.
+
+@item @code{?C}
+Match the character @samp{C} literally.
+@end table
+
+@subsubheading Fundamental structure
+
+@table @asis
+@item @code{(seq @var{rx}@dots{})}
+@cindex @code{seq} in rx
+@itemx @code{(sequence @var{rx}@dots{})}
+@cindex @code{sequence} in rx
+@itemx @code{(: @var{rx}@dots{})}
+@cindex @code{:} in rx
+@itemx @code{(and @var{rx}@dots{})}
+@cindex @code{and} in rx
+Match the @var{rx}s in sequence.  Without arguments, the expression
+matches the empty string.@*
+Corresponding string regexp: @samp{@var{A}@var{B}@dots{}}
+(subexpressions in sequence).
+
+@item @code{(or @var{rx}@dots{})}
+@cindex @code{or} in rx
+@itemx @code{(| @var{rx}@dots{})}
+@cindex @code{|} in rx
+Match exactly one of the @var{rx}s, trying from left to right.
+Without arguments, the expression will not match anything at all.@*
+Corresponding string regexp: @samp{@var{A}\|@var{B}\|@dots{}}.
+@end table
+
+@subsubheading Repetition
+
+@table @code
+@item (zero-or-more @var{rx}@dots{})
+@cindex @code{zero-or-more} in rx
+@itemx (0+ @var{rx}@dots{})
+@cindex @code{0+} in rx
+@itemx (* @var{rx}@dots{})
+@cindex @code{*} in rx
+Match the @var{rx}s zero or more times.@*
+Corresponding string regexp: @samp{@var{A}*}
+
+@item (one-or-more @var{rx}@dots{})
+@cindex @code{one-or-more} in rx
+@itemx (1+ @var{rx}@dots{})
+@cindex @code{1+} in rx
+@itemx (+ @var{rx}@dots{})
+@cindex @code{+} in rx
+Match the @var{rx}s one or more times.@*
+Corresponding string regexp: @samp{@var{A}+}
+
+@item (zero-or-one @var{rx}@dots{})
+@cindex @code{zero-or-one} in rx
+@itemx (optional @var{rx}@dots{})
+@cindex @code{optional} in rx
+@itemx (opt @var{rx}@dots{})
+@cindex @code{opt} in rx
+@itemx (? @var{rx}@dots{})
+@cindex @code{?} in rx
+Match the @var{rx}s once or an empty string.@*
+Corresponding string regexp: @samp{@var{A}?}
+
+@item (= @var{n} @var{rx}@dots{})
+@cindex @code{=} in rx
+@itemx (repeat @var{n} @var{rx})
+Match the @var{rx}s exactly @var{n} times.@*
+Corresponding string regexp: @samp{@var{A}\@{@var{n}\@}}
+
+@item (>= @var{n} @var{rx}@dots{})
+@cindex @code{>=} in rx
+Match the @var{rx}s @var{n} or more times.@*
+Corresponding string regexp: @samp{@var{A}\@{@var{n},\@}}
+
+@item (** @var{n} @var{m} @var{rx}@dots{})
+@cindex @code{**} in rx
+@itemx (repeat @var{n} @var{m} @var{rx}@dots{})
+@cindex @code{repeat} in rx
+Match the @var{rx}s at least @var{n} but no more than @var{m} times.@*
+Corresponding string regexp: @samp{@var{A}\@{@var{n},@var{m}\@}}
+@end table
+
+@subsubheading Non-greedy repetition
+
+Normally, repetition forms are greedy, in that they attempt to match
+as many times as possible.  The following three forms are non-greedy; they
+try to match as few times as possible (@pxref{Non-greedy repetition}).
+
+@table @code
+@item (*? @var{rx}@dots{})
+@cindex @code{*?} in rx
+Match the @var{rx}s zero or more times, non-greedily.@*
+Corresponding string regexp: @samp{@var{A}*?}
+
+@item (+? @var{rx}@dots{})
+@cindex @code{+?} in rx
+Match the @var{rx}s one or more times, non-greedily.@*
+Corresponding string regexp: @samp{@var{A}+?}
+
+@item (?? @var{rx}@dots{})
+@cindex @code{??} in rx
+Match the @var{rx}s or an empty string, non-greedily.@*
+Corresponding string regexp: @samp{@var{A}??}
+@end table
+
+The greediness of some repetition forms can be controlled using the
+following constructs.  However, it is usually better to use the
+explicit non-greedy forms above instead.
+
+@table @code
+@item (minimal-match @var{rx})
+@cindex @code{minimal-match} in rx
+Match @var{rx}, with @code{zero-or-more}, @code{0+},
+@code{one-or-more}, @code{1+}, @code{zero-or-one}, @code{opt} and
+@code{option} using non-greedy matching.
+
+@item (maximal-match @var{rx})
+@cindex @code{maximal-match} in rx
+Match @var{rx}, with @code{zero-or-more}, @code{0+},
+@code{one-or-more}, @code{1+}, @code{zero-or-one}, @code{opt} and
+@code{option} using non-greedy matching.  This is the default.
+@end table
+
+@subsubheading Matching single characters
+
+@table @asis
+@item @code{(any @var{set}@dots{})}
+@cindex @code{any} in rx
+@itemx @code{(char @var{set}@dots{})}
+@cindex @code{char} in rx
+@itemx @code{(in @var{set}@dots{})}
+@cindex @code{in} in rx
+@cindex character class in rx
+Match a single character from one of the @var{set}s.  Each @var{set}
+is a character, a string representing the set of its characters, a
+range or a character class (see below).  A range is either a
+hyphen-separated string like @code{"A-Z"}, or a cons of characters
+like @code{(?A . ?Z)}.
+
+Note that hyphen (@code{-}) is special in strings in this construct,
+since it acts as a range separator.  To include a hyphen, add it as a
+separate character or single-character string.@*
+Corresponding string regexp: @samp{[@dots{}]}
+
+@item @code{(not @var{charspec})}
+@cindex @code{not} in rx
+Match a character not included in @var{charspec}.  @var{charspec} can
+be an @code{any}, @code{syntax} or @code{category} form, or a
+character class.@*
+Corresponding string regexp: @samp{[^@dots{}]}, @samp{\S@var{code}},
+@samp{\C@var{code}}
+
+@item @code{not-newline}, @code{nonl}
+@cindex @code{not-newline} in rx
+@cindex @code{nonl} in rx
+Match any character except a newline.@*
+Corresponding string regexp: @samp{.} (dot)
+
+@item @code{anything}
+@cindex @code{anything} in rx
+Match any character.@*
+Corresponding string regexp: @samp{.\|\n} (for example)
+
+@item character class
+@cindex character class in rx
+Match a character from a named character class:
+
+@table @asis
+@item @code{alpha}, @code{alphabetic}, @code{letter}
+Match alphabetic characters.  More precisely, match characters whose
+Unicode @samp{general-category} property indicates that they are
+alphabetic.
+
+@item @code{alnum}, @code{alphanumeric}
+Match alphabetic characters and digits.  More precisely, match
+characters whose Unicode @samp{general-category} property indicates
+that they are alphabetic or decimal digits.
+
+@item @code{digit}, @code{numeric}, @code{num}
+Match the digits @samp{0}--@samp{9}.
+
+@item @code{xdigit}, @code{hex-digit}, @code{hex}
+Match the hexadecimal digits @samp{0}--@samp{9}, @samp{A}--@samp{F}
+and @samp{a}--@samp{f}.
+
+@item @code{cntrl}, @code{control}
+Match any character whose code is in the range 0--31.
+
+@item @code{blank}
+Match horizontal whitespace.  More precisely, match characters whose
+Unicode @samp{general-category} property indicates that they are
+spacing separators.
+
+@item @code{space}, @code{whitespace}, @code{white}
+Match any character that has whitespace syntax
+(@pxref{Syntax Class Table}).
+
+@item @code{lower}, @code{lower-case}
+Match anything lower-case, as determined by the current case table.
+If @code{case-fold-search} is non-nil, this also matches any
+upper-case letter.
+
+@item @code{upper}, @code{upper-case}
+Match anything upper-case, as determined by the current case table.
+If @code{case-fold-search} is non-nil, this also matches any
+lower-case letter.
+
+@item @code{graph}, @code{graphic}
+Match any character except whitespace, @acronym{ASCII} and
+non-@acronym{ASCII} control characters, surrogates, and codepoints
+unassigned by Unicode, as indicated by the Unicode
+@samp{general-category} property.
+
+@item @code{print}, @code{printing}
+Match whitespace or a character matched by @code{graph}.
+
+@item @code{punct}, @code{punctuation}
+Match any punctuation character.  (At present, for multibyte
+characters, anything that has non-word syntax.)
+
+@item @code{word}, @code{wordchar}
+Match any character that has word syntax (@pxref{Syntax Class Table}).
+
+@item @code{ascii}
+Match any @acronym{ASCII} character (codes 0--127).
+
+@item @code{nonascii}
+Match any non-@acronym{ASCII} character (but not raw bytes).
+@end table
+
+Corresponding string regexp: @samp{[[:@var{class}:]]}
+
+@item @code{(syntax @var{syntax})}
+@cindex @code{syntax} in rx
+Match a character with syntax @var{syntax}, being one of the following
+names:
+
+@multitable {@code{close-parenthesis}} {Syntax character}
+@headitem Syntax name          @tab Syntax character
+@item @code{whitespace}        @tab @code{-}
+@item @code{punctuation}       @tab @code{.}
+@item @code{word}              @tab @code{w}
+@item @code{symbol}            @tab @code{_}
+@item @code{open-parenthesis}  @tab @code{(}
+@item @code{close-parenthesis} @tab @code{)}
+@item @code{expression-prefix} @tab @code{'}
+@item @code{string-quote}      @tab @code{"}
+@item @code{paired-delimiter}  @tab @code{$}
+@item @code{escape}            @tab @code{\}
+@item @code{character-quote}   @tab @code{/}
+@item @code{comment-start}     @tab @code{<}
+@item @code{comment-end}       @tab @code{>}
+@item @code{string-delimiter}  @tab @code{|}
+@item @code{comment-delimiter} @tab @code{!}
+@end multitable
+
+For details, @pxref{Syntax Class Table}.  Please note that
+@code{(syntax punctuation)} is @emph{not} equivalent to the character class
+@code{punctuation}.@*
+Corresponding string regexp: @samp{\s@var{code}}
+
+@item @code {(category @var{category})}
+@cindex @code{category} in rx
+Match a character in category @var{category}, which is either one of
+the names below or its category character.
+
+@multitable {@code{vowel-modifying-diacritical-mark}} {Category character}
+@headitem Category name                       @tab Category character
+@item @code{space-for-indent}                 @tab space
+@item @code{base}                             @tab @code{.}
+@item @code{consonant}                        @tab @code{0}
+@item @code{base-vowel}                       @tab @code{1}
+@item @code{upper-diacritical-mark}           @tab @code{2}
+@item @code{lower-diacritical-mark}           @tab @code{3}
+@item @code{tone-mark}                        @tab @code{4}
+@item @code{symbol}                           @tab @code{5}
+@item @code{digit}                            @tab @code{6}
+@item @code{vowel-modifying-diacritical-mark} @tab @code{7}
+@item @code{vowel-sign}                       @tab @code{8}
+@item @code{semivowel-lower}                  @tab @code{9}
+@item @code{not-at-end-of-line}               @tab @code{<}
+@item @code{not-at-beginning-of-line}         @tab @code{>}
+@item @code{alpha-numeric-two-byte}           @tab @code{A}
+@item @code{chinese-two-byte}                 @tab @code{C}
+@item @code{greek-two-byte}                   @tab @code{G}
+@item @code{japanese-hiragana-two-byte}       @tab @code{H}
+@item @code{indian-two-byte}                  @tab @code{I}
+@item @code{japanese-katakana-two-byte}       @tab @code{K}
+@item @code{strong-left-to-right}             @tab @code{L}
+@item @code{korean-hangul-two-byte}           @tab @code{N}
+@item @code{strong-right-to-left}             @tab @code{R}
+@item @code{cyrillic-two-byte}                @tab @code{Y}
+@item @code{combining-diacritic}              @tab @code{^}
+@item @code{ascii}                            @tab @code{a}
+@item @code{arabic}                           @tab @code{b}
+@item @code{chinese}                          @tab @code{c}
+@item @code{ethiopic}                         @tab @code{e}
+@item @code{greek}                            @tab @code{g}
+@item @code{korean}                           @tab @code{h}
+@item @code{indian}                           @tab @code{i}
+@item @code{japanese}                         @tab @code{j}
+@item @code{japanese-katakana}                @tab @code{k}
+@item @code{latin}                            @tab @code{l}
+@item @code{lao}                              @tab @code{o}
+@item @code{tibetan}                          @tab @code{q}
+@item @code{japanese-roman}                   @tab @code{r}
+@item @code{thai}                             @tab @code{t}
+@item @code{vietnamese}                       @tab @code{v}
+@item @code{hebrew}                           @tab @code{w}
+@item @code{cyrillic}                         @tab @code{y}
+@item @code{can-break}                        @tab @code{|}
+@end multitable
+
+For more information about currently defined categories, run the
+command @kbd{M-x describe-categories @key{RET}}.  For how to define
+new categories, @pxref{Categories}.@*
+Corresponding string regexp: @samp{\c@var{code}}
+@end table
+
+@subsubheading Zero-width assertions
+
+These all match the empty string, but only in specific places.
+
+@table @asis
+@item @code{line-start}, @code{bol}
+@cindex @code{line-start} in rx
+@cindex @code{bol} in rx
+Match at the beginning of a line.@*
+Corresponding string regexp: @samp{^}
+
+@item @code{line-end}, @code{eol}
+@cindex @code{line-end} in rx
+@cindex @code{eol} in rx
+Match at the end of a line.@*
+Corresponding string regexp: @samp{$}
+
+@item @code{string-start}, @code{bos}, @code{buffer-start}, @code{bot}
+@cindex @code{string-start} in rx
+@cindex @code{bos} in rx
+@cindex @code{buffer-start} in rx
+@cindex @code{bot} in rx
+Match at the start of the string or buffer being matched against.@*
+Corresponding string regexp: @samp{\`}
+
+@item @code{string-end}, @code{eos}, @code{buffer-end}, @code{eot}
+@cindex @code{string-end} in rx
+@cindex @code{eos} in rx
+@cindex @code{buffer-end} in rx
+@cindex @code{eot} in rx
+Match at the end of the string or buffer being matched against.@*
+Corresponding string regexp: @samp{\'}
+
+@item @code{point}
+@cindex @code{point} in rx
+Match at point.@*
+Corresponding string regexp: @samp{\=}
+
+@item @code{word-start}
+@cindex @code{word-start} in rx
+Match at the beginning of a word.@*
+Corresponding string regexp: @samp{\<}
+
+@item @code{word-end}
+@cindex @code{word-end} in rx
+Match at the end of a word.@*
+Corresponding string regexp: @samp{\>}
+
+@item @code{word-boundary}
+@cindex @code{word-boundary} in rx
+Match at the beginning or end of a word.@*
+Corresponding string regexp: @samp{\b}
+
+@item @code{not-word-boundary}
+@cindex @code{not-word-boundary} in rx
+Match anywhere but at the beginning or end of a word.@*
+Corresponding string regexp: @samp{\B}
+
+@item @code{symbol-start}
+@cindex @code{symbol-start} in rx
+Match at the beginning of a symbol.@*
+Corresponding string regexp: @samp{\_<}
+
+@item @code{symbol-end}
+@cindex @code{symbol-end} in rx
+Match at the end of a symbol.@*
+Corresponding string regexp: @samp{\_>}
+@end table
+
+@subsubheading Capture groups
+
+@table @code
+@item (group @var{rx}@dots{})
+@cindex @code{group} in rx
+@itemx (submatch @var{rx}@dots{})
+@cindex @code{submatch} in rx
+Match the @var{rx}s, making the matched text and position accessible
+in the match data.  The first group in a regexp is numbered 1;
+subsequent groups will be numbered one higher than the previous
+group.@*
+Corresponding string regexp: @samp{\(@dots{}\)}
+
+@item (group-n @var{n} @var{rx}@dots{})
+@cindex @code{group-n} in rx
+@itemx (submatch-n @var{n} @var{rx}@dots{})
+@cindex @code{submatch-n} in rx
+Like @code{group}, but explicitly assign the group number @var{n}.
+@var{n} must be positive.@*
+Corresponding string regexp: @samp{\(?@var{n}:@dots{}\)}
+
+@item (backref @var{n})
+@cindex @code{backref} in rx
+Match the text previously matched by group number @var{n}.
+@var{n} must be in the range 1--9.@*
+Corresponding string regexp: @samp{\@var{n}}
+@end table
+
+@subsubheading Dynamic inclusion
+
+@table @code
+@item (literal @var{expr})
+@cindex @code{literal} in rx
+Match the literal string that is the result from evaluating the Lisp
+expression @var{expr}.  The evaluation takes place at call time, in
+the current lexical environment.
+
+@item (regexp @var{expr})
+@cindex @code{regexp} in rx
+@itemx (regex @var{expr})
+@cindex @code{regex} in rx
+Match the string regexp that is the result from evaluating the Lisp
+expression @var{expr}.  The evaluation takes place at call time, in
+the current lexical environment.
+
+@item (eval @var{expr})
+@cindex @code{eval} in rx
+Match the rx form that is the result from evaluating the Lisp
+expression @var{expr}.  The evaluation takes place at macro-expansion
+time for @code{rx}, at call time for @code{rx-to-string},
+in the current global environment.
+@end table
+
+@node Rx Functions
+@subsubsection Functions and macros using @code{rx} regexps
+
+@defmac rx rx-expr@dots{}
+Translate the @var{rx-expr}s to a string regexp, as if they were the
+body of a @code{(seq @dots{})} form.  The @code{rx} macro expands to a
+string constant, or, if @code{literal} or @code{regexp} forms are
+used, a Lisp expression that evaluates to a string.
+@end defmac
+
+@defun rx-to-string rx-expr &optional no-group
+Translate @var{rx-expr} to a string regexp which is returned.
+If @var{no-group} is absent or nil, bracket the result in a
+non-capturing group, @samp{\(?:@dots{}\)}, if necessary to ensure that
+a postfix operator appended to it will apply to the whole expression.
+
+Arguments to @code{literal} and @code{regexp} forms in @var{rx-expr}
+must be string literals.
+@end defun
+
+The @code{pcase} macro can use @code{rx} expressions as patterns
+directly; @pxref{rx in pcase}.
+
 @node Regexp Functions
 @subsection Regular Expression Functions
 
-- 
2.20.1 (Apple Git-117)

next prev parent reply	other threads:[~2019-07-05 14:13 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-04 12:13 bug#36496: [PATCH] Describe the rx notation in the lisp manual Mattias Engdegård
2019-07-04 14:59 ` Drew Adams
2019-07-04 16:28 ` Eli Zaretskii
2019-07-05 14:13   ` Mattias Engdegård [this message]
2019-07-06  9:08     ` Eli Zaretskii
2019-07-06 11:33       ` Mattias Engdegård
2019-07-06 11:41         ` Eli Zaretskii
2019-07-06 18:56           ` Mattias Engdegård
2019-07-06 19:10             ` Eli Zaretskii
2019-07-06 19:45               ` Mattias Engdegård
2019-07-07  2:29                 ` Eli Zaretskii
2019-07-07 11:31                   ` Mattias Engdegård
2019-07-07 14:33                     ` Eli Zaretskii
2022-04-25 15:12                     ` Lars Ingebrigtsen
2019-07-06 19:12             ` Noam Postavsky
2019-07-06 11:59         ` Noam Postavsky
2019-07-06 23:56         ` Richard Stallman
2019-07-06  0:10   ` Richard Stallman
2019-07-06  6:47     ` Eli Zaretskii
2019-07-06 23:59       ` Richard Stallman
2019-07-07  0:36         ` Drew Adams
2019-07-07 23:51           ` Richard Stallman
2019-07-08  0:56             ` Drew Adams
2019-07-08 23:46               ` Richard Stallman
2019-07-09  0:19                 ` Drew Adams
2019-07-08 23:44             ` Richard Stallman

find likely ancestor, descendant, or conflicting patches for this message:
dfblob:e308d68b7 dfblob:ef1cffc44 dfblob:625964774 dfblob:40a9cb523
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AF152105-C741-45BD-85C6-1584DDE91800@acm.org \
    --to=mattiase@acm.org \
    --cc=36496@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).