* Implement new symbol-start and symbol-end regexp operators
@ 2004-04-29 22:57 Jim Blandy
2004-05-01 9:44 ` Richard Stallman
2004-05-19 16:38 ` Stefan Monnier
0 siblings, 2 replies; 14+ messages in thread
From: Jim Blandy @ 2004-04-29 22:57 UTC (permalink / raw)
[-- Attachment #1: Type: text/plain, Size: 449 bytes --]
I find these really useful --- finally I can isearch-regexp for
identifiers in code and match exactly what I mean. Synthesizing these
operators from the existing ones is a pain, partly because "symbol
constituents" are those characters whose syntax is *either* 'word' or
'symbol'.
This also makes etags / ctags regexps behave a bit more like Emacs
regexps: '_' is no longer a word constituent. I don't know if that's
actually a feature or not.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Implement symbol-start and symbol-end regexp operators. --]
[-- Type: text/x-patch, Size: 8768 bytes --]
src/ChangeLog:
2004-04-29 Jim Blandy <jimb@redhat.com>
Add support for new '\_<' and '\_>' regexp operators, matching the
beginning and ends of symbols.
* regex.c (enum syntaxcode): Add Ssymbol.
(init_syntax_once): Set the syntax for '_' to Ssymbol, not Sword.
(symbeg, symend): New opcodes.
(print_partial_compiled_pattern): Print the new opcodes properly.
(regex_compile): Parse the new operators.
(analyze_first): symbeg and symend match only the empty string.
(mutually_exclusive_p): symend is mutually exclusive with \s_ and
\sw; symbeg is mutually exclusive with \S_ and \Sw.
(re_match_2_internal): Add code for symbeg and symend.
* search.c (trivial_regexp_p): \_ is no longer a trivial regexp.
man/ChangeLog:
2004-04-29 Jim Blandy <jimb@redhat.com>
* search.texi (Regexps): Document the \_< and \_> regexp operators.
*** src/regex.c.~2~ 2004-04-29 15:56:53.000000000 -0500
--- src/regex.c 2004-04-29 17:44:24.000000000 -0500
***************
*** 219,225 ****
/* Define the syntax stuff for \<, \>, etc. */
/* Sword must be nonzero for the wordchar pattern commands in re_match_2. */
! enum syntaxcode { Swhitespace = 0, Sword = 1 };
# ifdef SWITCH_ENUM_BUG
# define SWITCH_ENUM_CAST(x) ((int)(x))
--- 219,225 ----
/* Define the syntax stuff for \<, \>, etc. */
/* Sword must be nonzero for the wordchar pattern commands in re_match_2. */
! enum syntaxcode { Swhitespace = 0, Sword = 1, Ssymbol = 2 };
# ifdef SWITCH_ENUM_BUG
# define SWITCH_ENUM_CAST(x) ((int)(x))
***************
*** 399,405 ****
if (ISALNUM (c))
re_syntax_table[c] = Sword;
! re_syntax_table['_'] = Sword;
done = 1;
}
--- 399,405 ----
if (ISALNUM (c))
re_syntax_table[c] = Sword;
! re_syntax_table['_'] = Ssymbol;
done = 1;
}
***************
*** 656,661 ****
--- 656,664 ----
wordbound, /* Succeeds if at a word boundary. */
notwordbound, /* Succeeds if not at a word boundary. */
+ symbeg, /* Succeeds if at symbol beginning. */
+ symend, /* Succeeds if at symbol end. */
+
/* Matches any character whose syntax is specified. Followed by
a byte which contains a syntax code, e.g., Sword. */
syntaxspec,
***************
*** 1095,1100 ****
--- 1098,1110 ----
case wordend:
printf ("/wordend");
+ case symbeg:
+ printf ("/symbeg");
+ break;
+
+ case symend:
+ printf ("/symend");
+
case syntaxspec:
printf ("/syntaxspec");
mcnt = *p++;
***************
*** 3135,3140 ****
--- 3145,3163 ----
BUF_PUSH (wordend);
break;
+ case '_':
+ if (syntax & RE_NO_GNU_OPS)
+ goto normal_char;
+ laststart = b;
+ PATFETCH (c);
+ if (c == '<')
+ BUF_PUSH (symbeg);
+ else if (c == '>')
+ BUF_PUSH (symend);
+ else
+ FREE_STACK_RETURN (REG_BADPAT);
+ break;
+
case 'b':
if (syntax & RE_NO_GNU_OPS)
goto normal_char;
***************
*** 3629,3634 ****
--- 3652,3659 ----
case notwordbound:
case wordbeg:
case wordend:
+ case symbeg:
+ case symend:
continue;
***************
*** 4396,4409 ****
break;
case wordend:
! case notsyntaxspec:
return ((re_opcode_t) *p1 == syntaxspec
! && p1[1] == (op2 == wordend ? Sword : p2[1]));
case wordbeg:
! case syntaxspec:
return ((re_opcode_t) *p1 == notsyntaxspec
! && p1[1] == (op2 == wordbeg ? Sword : p2[1]));
case wordbound:
return (((re_opcode_t) *p1 == notsyntaxspec
--- 4421,4440 ----
break;
case wordend:
! return ((re_opcode_t) *p1 == syntaxspec && p1[1] == Sword);
! case symend:
return ((re_opcode_t) *p1 == syntaxspec
! && (p1[1] == Ssymbol || p1[1] == Sword));
! case notsyntaxspec:
! return ((re_opcode_t) *p1 == syntaxspec && p1[1] == p2[1]);
case wordbeg:
! return ((re_opcode_t) *p1 == notsyntaxspec && p1[1] == Sword);
! case symbeg:
return ((re_opcode_t) *p1 == notsyntaxspec
! && (p1[1] == Ssymbol || p1[1] == Sword));
! case syntaxspec:
! return ((re_opcode_t) *p1 == notsyntaxspec && p1[1] == p2[1]);
case wordbound:
return (((re_opcode_t) *p1 == notsyntaxspec
***************
*** 5528,5533 ****
--- 5559,5650 ----
}
break;
+ case symbeg:
+ DEBUG_PRINT1 ("EXECUTING symbeg.\n");
+
+ /* We FAIL in one of the following cases: */
+
+ /* Case 1: D is at the end of string. */
+ if (AT_STRINGS_END (d))
+ goto fail;
+ else
+ {
+ /* C1 is the character before D, S1 is the syntax of C1, C2
+ is the character at D, and S2 is the syntax of C2. */
+ re_wchar_t c1, c2;
+ int s1, s2;
+ #ifdef emacs
+ int offset = PTR_TO_OFFSET (d);
+ int charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+ UPDATE_SYNTAX_TABLE (charpos);
+ #endif
+ PREFETCH ();
+ c2 = RE_STRING_CHAR (d, dend - d);
+ s2 = SYNTAX (c2);
+
+ /* Case 2: S2 is neither Sword nor Ssymbol. */
+ if (s2 != Sword && s2 != Ssymbol)
+ goto fail;
+
+ /* Case 3: D is not at the beginning of string ... */
+ if (!AT_STRINGS_BEG (d))
+ {
+ GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
+ #ifdef emacs
+ UPDATE_SYNTAX_TABLE_BACKWARD (charpos - 1);
+ #endif
+ s1 = SYNTAX (c1);
+
+ /* ... and S1 is Sword or Ssymbol. */
+ if (s1 == Sword || s1 == Ssymbol)
+ goto fail;
+ }
+ }
+ break;
+
+ case symend:
+ DEBUG_PRINT1 ("EXECUTING symend.\n");
+
+ /* We FAIL in one of the following cases: */
+
+ /* Case 1: D is at the beginning of string. */
+ if (AT_STRINGS_BEG (d))
+ goto fail;
+ else
+ {
+ /* C1 is the character before D, S1 is the syntax of C1, C2
+ is the character at D, and S2 is the syntax of C2. */
+ re_wchar_t c1, c2;
+ int s1, s2;
+ #ifdef emacs
+ int offset = PTR_TO_OFFSET (d) - 1;
+ int charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+ UPDATE_SYNTAX_TABLE (charpos);
+ #endif
+ GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
+ s1 = SYNTAX (c1);
+
+ /* Case 2: S1 is neither Ssymbol nor Sword. */
+ if (s1 != Sword && s1 != Ssymbol)
+ goto fail;
+
+ /* Case 3: D is not at the end of string ... */
+ if (!AT_STRINGS_END (d))
+ {
+ PREFETCH_NOLIMIT ();
+ c2 = RE_STRING_CHAR (d, dend - d);
+ #ifdef emacs
+ UPDATE_SYNTAX_TABLE_FORWARD (charpos);
+ #endif
+ s2 = SYNTAX (c2);
+
+ /* ... and S2 is Sword or Ssymbol. */
+ if (s2 == Sword || s2 == Ssymbol)
+ goto fail;
+ }
+ }
+ break;
+
case syntaxspec:
case notsyntaxspec:
not = (re_opcode_t) *(p - 1) == notsyntaxspec;
*** src/search.c.~1~ 2002-05-12 19:04:16.000000000 -0500
--- src/search.c 2004-04-29 17:30:17.000000000 -0500
***************
*** 962,968 ****
{
case '|': case '(': case ')': case '`': case '\'': case 'b':
case 'B': case '<': case '>': case 'w': case 'W': case 's':
! case 'S': case '=': case '{': case '}':
case 'c': case 'C': /* for categoryspec and notcategoryspec */
case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
--- 962,968 ----
{
case '|': case '(': case ')': case '`': case '\'': case 'b':
case 'B': case '<': case '>': case 'w': case 'W': case 's':
! case 'S': case '=': case '{': case '}': case '_':
case 'c': case 'C': /* for categoryspec and notcategoryspec */
case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
*** man/search.texi.~1~ 2002-07-06 08:44:06.000000000 -0500
--- man/search.texi 2004-04-29 17:38:41.000000000 -0500
***************
*** 672,677 ****
--- 672,689 ----
@item \W
matches any character that is not a word-constituent.
+ @item \_<
+ matches the empty string, but only at the beginning of a symbol. A
+ symbol is a sequence of one or more word or symbol constituent
+ characters. @samp{\_<} matches at the beginning of the buffer only if
+ a symbol-constituent character follows.
+
+ @item \_>
+ matches the empty string, but only at the end of a symbol. A symbol
+ is a sequence of one or more word or symbol constituent characters.
+ @samp{\_>} matches at the end of the buffer only if the contents end
+ with a symbol-constituent character.
+
@item \s@var{c}
matches any character whose syntax is @var{c}. Here @var{c} is a
character that designates a particular syntax class: thus, @samp{w}
[-- Attachment #3: Type: text/plain, Size: 141 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-04-29 22:57 Implement new symbol-start and symbol-end regexp operators Jim Blandy
@ 2004-05-01 9:44 ` Richard Stallman
2004-05-04 19:17 ` Jim Blandy
2004-05-19 16:38 ` Stefan Monnier
1 sibling, 1 reply; 14+ messages in thread
From: Richard Stallman @ 2004-05-01 9:44 UTC (permalink / raw)
Cc: emacs-devel
These do sound useful. Could you write the text for etc/NEWS and
update the Lisp manual?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-05-01 9:44 ` Richard Stallman
@ 2004-05-04 19:17 ` Jim Blandy
2004-05-05 6:20 ` Eli Zaretskii
0 siblings, 1 reply; 14+ messages in thread
From: Jim Blandy @ 2004-05-04 19:17 UTC (permalink / raw)
Cc: emacs-devel
Richard Stallman <rms@gnu.org> writes:
> These do sound useful. Could you write the text for etc/NEWS and
> update the Lisp manual?
Okay. Here is a revised patch:
src/ChangeLog:
2004-04-29 Jim Blandy <jimb@redhat.com>
Add support for new '\_<' and '\_>' regexp operators, matching the
beginning and ends of symbols.
* regex.c (enum syntaxcode): Add Ssymbol.
(init_syntax_once): Set the syntax for '_' to Ssymbol, not Sword.
(symbeg, symend): New opcodes.
(print_partial_compiled_pattern): Print the new opcodes properly.
(regex_compile): Parse the new operators.
(analyze_first): symbeg and symend match only the empty string.
(mutually_exclusive_p): symend is mutually exclusive with \s_ and
\sw; symbeg is mutually exclusive with \S_ and \Sw.
(re_match_2_internal): Add code for symbeg and symend.
* search.c (trivial_regexp_p): \_ is no longer a trivial regexp.
man/ChangeLog:
2004-04-29 Jim Blandy <jimb@redhat.com>
* search.texi (Regexps): Document the \_< and \_> regexp operators.
lispref/ChangeLog:
2004-05-04 Jim Blandy <jimb@redhat.com>
* searching.texi (Regexp Backslash): Document new \_< and \_>
operators.
*** src/regex.c.~2~ 2004-04-29 15:56:53.000000000 -0500
--- src/regex.c 2004-04-29 17:44:24.000000000 -0500
***************
*** 219,225 ****
/* Define the syntax stuff for \<, \>, etc. */
/* Sword must be nonzero for the wordchar pattern commands in re_match_2. */
! enum syntaxcode { Swhitespace = 0, Sword = 1 };
# ifdef SWITCH_ENUM_BUG
# define SWITCH_ENUM_CAST(x) ((int)(x))
--- 219,225 ----
/* Define the syntax stuff for \<, \>, etc. */
/* Sword must be nonzero for the wordchar pattern commands in re_match_2. */
! enum syntaxcode { Swhitespace = 0, Sword = 1, Ssymbol = 2 };
# ifdef SWITCH_ENUM_BUG
# define SWITCH_ENUM_CAST(x) ((int)(x))
***************
*** 399,405 ****
if (ISALNUM (c))
re_syntax_table[c] = Sword;
! re_syntax_table['_'] = Sword;
done = 1;
}
--- 399,405 ----
if (ISALNUM (c))
re_syntax_table[c] = Sword;
! re_syntax_table['_'] = Ssymbol;
done = 1;
}
***************
*** 656,661 ****
--- 656,664 ----
wordbound, /* Succeeds if at a word boundary. */
notwordbound, /* Succeeds if not at a word boundary. */
+ symbeg, /* Succeeds if at symbol beginning. */
+ symend, /* Succeeds if at symbol end. */
+
/* Matches any character whose syntax is specified. Followed by
a byte which contains a syntax code, e.g., Sword. */
syntaxspec,
***************
*** 1095,1100 ****
--- 1098,1110 ----
case wordend:
printf ("/wordend");
+ case symbeg:
+ printf ("/symbeg");
+ break;
+
+ case symend:
+ printf ("/symend");
+
case syntaxspec:
printf ("/syntaxspec");
mcnt = *p++;
***************
*** 3135,3140 ****
--- 3145,3163 ----
BUF_PUSH (wordend);
break;
+ case '_':
+ if (syntax & RE_NO_GNU_OPS)
+ goto normal_char;
+ laststart = b;
+ PATFETCH (c);
+ if (c == '<')
+ BUF_PUSH (symbeg);
+ else if (c == '>')
+ BUF_PUSH (symend);
+ else
+ FREE_STACK_RETURN (REG_BADPAT);
+ break;
+
case 'b':
if (syntax & RE_NO_GNU_OPS)
goto normal_char;
***************
*** 3629,3634 ****
--- 3652,3659 ----
case notwordbound:
case wordbeg:
case wordend:
+ case symbeg:
+ case symend:
continue;
***************
*** 4396,4409 ****
break;
case wordend:
! case notsyntaxspec:
return ((re_opcode_t) *p1 == syntaxspec
! && p1[1] == (op2 == wordend ? Sword : p2[1]));
case wordbeg:
! case syntaxspec:
return ((re_opcode_t) *p1 == notsyntaxspec
! && p1[1] == (op2 == wordbeg ? Sword : p2[1]));
case wordbound:
return (((re_opcode_t) *p1 == notsyntaxspec
--- 4421,4440 ----
break;
case wordend:
! return ((re_opcode_t) *p1 == syntaxspec && p1[1] == Sword);
! case symend:
return ((re_opcode_t) *p1 == syntaxspec
! && (p1[1] == Ssymbol || p1[1] == Sword));
! case notsyntaxspec:
! return ((re_opcode_t) *p1 == syntaxspec && p1[1] == p2[1]);
case wordbeg:
! return ((re_opcode_t) *p1 == notsyntaxspec && p1[1] == Sword);
! case symbeg:
return ((re_opcode_t) *p1 == notsyntaxspec
! && (p1[1] == Ssymbol || p1[1] == Sword));
! case syntaxspec:
! return ((re_opcode_t) *p1 == notsyntaxspec && p1[1] == p2[1]);
case wordbound:
return (((re_opcode_t) *p1 == notsyntaxspec
***************
*** 5528,5533 ****
--- 5559,5650 ----
}
break;
+ case symbeg:
+ DEBUG_PRINT1 ("EXECUTING symbeg.\n");
+
+ /* We FAIL in one of the following cases: */
+
+ /* Case 1: D is at the end of string. */
+ if (AT_STRINGS_END (d))
+ goto fail;
+ else
+ {
+ /* C1 is the character before D, S1 is the syntax of C1, C2
+ is the character at D, and S2 is the syntax of C2. */
+ re_wchar_t c1, c2;
+ int s1, s2;
+ #ifdef emacs
+ int offset = PTR_TO_OFFSET (d);
+ int charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+ UPDATE_SYNTAX_TABLE (charpos);
+ #endif
+ PREFETCH ();
+ c2 = RE_STRING_CHAR (d, dend - d);
+ s2 = SYNTAX (c2);
+
+ /* Case 2: S2 is neither Sword nor Ssymbol. */
+ if (s2 != Sword && s2 != Ssymbol)
+ goto fail;
+
+ /* Case 3: D is not at the beginning of string ... */
+ if (!AT_STRINGS_BEG (d))
+ {
+ GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
+ #ifdef emacs
+ UPDATE_SYNTAX_TABLE_BACKWARD (charpos - 1);
+ #endif
+ s1 = SYNTAX (c1);
+
+ /* ... and S1 is Sword or Ssymbol. */
+ if (s1 == Sword || s1 == Ssymbol)
+ goto fail;
+ }
+ }
+ break;
+
+ case symend:
+ DEBUG_PRINT1 ("EXECUTING symend.\n");
+
+ /* We FAIL in one of the following cases: */
+
+ /* Case 1: D is at the beginning of string. */
+ if (AT_STRINGS_BEG (d))
+ goto fail;
+ else
+ {
+ /* C1 is the character before D, S1 is the syntax of C1, C2
+ is the character at D, and S2 is the syntax of C2. */
+ re_wchar_t c1, c2;
+ int s1, s2;
+ #ifdef emacs
+ int offset = PTR_TO_OFFSET (d) - 1;
+ int charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+ UPDATE_SYNTAX_TABLE (charpos);
+ #endif
+ GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
+ s1 = SYNTAX (c1);
+
+ /* Case 2: S1 is neither Ssymbol nor Sword. */
+ if (s1 != Sword && s1 != Ssymbol)
+ goto fail;
+
+ /* Case 3: D is not at the end of string ... */
+ if (!AT_STRINGS_END (d))
+ {
+ PREFETCH_NOLIMIT ();
+ c2 = RE_STRING_CHAR (d, dend - d);
+ #ifdef emacs
+ UPDATE_SYNTAX_TABLE_FORWARD (charpos);
+ #endif
+ s2 = SYNTAX (c2);
+
+ /* ... and S2 is Sword or Ssymbol. */
+ if (s2 == Sword || s2 == Ssymbol)
+ goto fail;
+ }
+ }
+ break;
+
case syntaxspec:
case notsyntaxspec:
not = (re_opcode_t) *(p - 1) == notsyntaxspec;
*** src/search.c.~1~ 2002-05-12 19:04:16.000000000 -0500
--- src/search.c 2004-04-29 17:30:17.000000000 -0500
***************
*** 962,968 ****
{
case '|': case '(': case ')': case '`': case '\'': case 'b':
case 'B': case '<': case '>': case 'w': case 'W': case 's':
! case 'S': case '=': case '{': case '}':
case 'c': case 'C': /* for categoryspec and notcategoryspec */
case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
--- 962,968 ----
{
case '|': case '(': case ')': case '`': case '\'': case 'b':
case 'B': case '<': case '>': case 'w': case 'W': case 's':
! case 'S': case '=': case '{': case '}': case '_':
case 'c': case 'C': /* for categoryspec and notcategoryspec */
case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
*** man/search.texi.~1~ 2002-07-06 08:44:06.000000000 -0500
--- man/search.texi 2004-04-29 17:38:41.000000000 -0500
***************
*** 672,677 ****
--- 672,689 ----
@item \W
matches any character that is not a word-constituent.
+ @item \_<
+ matches the empty string, but only at the beginning of a symbol. A
+ symbol is a sequence of one or more word or symbol constituent
+ characters. @samp{\_<} matches at the beginning of the buffer only if
+ a symbol-constituent character follows.
+
+ @item \_>
+ matches the empty string, but only at the end of a symbol. A symbol
+ is a sequence of one or more word or symbol constituent characters.
+ @samp{\_>} matches at the end of the buffer only if the contents end
+ with a symbol-constituent character.
+
@item \s@var{c}
matches any character whose syntax is @var{c}. Here @var{c} is a
character that designates a particular syntax class: thus, @samp{w}
*** lispref/searching.texi.~1.48.~ 2004-02-16 20:09:15.000000000 -0500
--- lispref/searching.texi 2004-05-04 14:06:36.000000000 -0500
***************
*** 666,671 ****
--- 666,685 ----
with a word-constituent character.
@end table
+ @item \_<
+ @cindex @samp{\_<} in regexp
+ matches the empty string, but only at the beginning of a symbol. A
+ symbol is a sequence of one or more word or symbol constituent
+ characters. @samp{\_<} matches at the beginning of the buffer (or
+ string) only if a symbol-constituent character follows.
+
+ @item \_>
+ @cindex @samp{\_>} in regexp
+ matches the empty string, but only at the end of a symbol. A symbol
+ is a sequence of one or more word or symbol constituent characters.
+ @samp{\_>} matches at the end of the buffer (or string) only if the
+ contents end with a symbol-constituent character.
+
@kindex invalid-regexp
Not every string is a valid regular expression. For example, a string
with unbalanced square brackets is invalid (with a few exceptions, such
*** etc/NEWS.~1.950.~ 2004-04-27 17:02:27.000000000 -0500
--- etc/NEWS 2004-05-04 14:15:33.000000000 -0500
***************
*** 90,95 ****
--- 90,101 ----
\f
* Changes in Emacs 21.4
+ +++
+ ** There are now two new regular expression operators, \_< and \_>,
+ for matching the beginning and end of a symbol. A symbol is a
+ non-empty sequence of either word or symbol constituent characters, as
+ specified by the syntax table.
+
---
** The IELM prompt is now, by default, read-only. This can be
controlled with the new user option `ielm-prompt-read-only'.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-05-05 6:20 ` Eli Zaretskii
@ 2004-05-05 6:14 ` Jim Blandy
2004-05-05 20:21 ` Richard Stallman
2004-05-12 17:36 ` Jim Blandy
0 siblings, 2 replies; 14+ messages in thread
From: Jim Blandy @ 2004-05-05 6:14 UTC (permalink / raw)
Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 354 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
> > From: Jim Blandy <jimb@redhat.com>
> > Date: 04 May 2004 14:17:44 -0500
> >
> > + @item \_<
> > + @cindex @samp{\_<} in regexp
>
> IMHO, an additional index entry here, something like
>
> @cindex matching symbols in regexp
>
> would be useful.
Okay, I added:
+ @cindex symbols, matching in regexp
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Implement regexp operators matching symbol start/end --]
[-- Type: text/x-patch, Size: 10527 bytes --]
src/ChangeLog:
2004-04-29 Jim Blandy <jimb@redhat.com>
Add support for new '\_<' and '\_>' regexp operators, matching the
beginning and ends of symbols.
* regex.c (enum syntaxcode): Add Ssymbol.
(init_syntax_once): Set the syntax for '_' to Ssymbol, not Sword.
(symbeg, symend): New opcodes.
(print_partial_compiled_pattern): Print the new opcodes properly.
(regex_compile): Parse the new operators.
(analyze_first): symbeg and symend match only the empty string.
(mutually_exclusive_p): symend is mutually exclusive with \s_ and
\sw; symbeg is mutually exclusive with \S_ and \Sw.
(re_match_2_internal): Add code for symbeg and symend.
* search.c (trivial_regexp_p): \_ is no longer a trivial regexp.
man/ChangeLog:
2004-04-29 Jim Blandy <jimb@redhat.com>
* search.texi (Regexps): Document the \_< and \_> regexp operators.
lispref/ChangeLog:
2004-05-04 Jim Blandy <jimb@redhat.com>
* searching.texi (Regexp Backslash): Document new \_< and \_>
operators.
*** src/regex.c.~2~ 2004-04-29 15:56:53.000000000 -0500
--- src/regex.c 2004-04-29 17:44:24.000000000 -0500
***************
*** 219,225 ****
/* Define the syntax stuff for \<, \>, etc. */
/* Sword must be nonzero for the wordchar pattern commands in re_match_2. */
! enum syntaxcode { Swhitespace = 0, Sword = 1 };
# ifdef SWITCH_ENUM_BUG
# define SWITCH_ENUM_CAST(x) ((int)(x))
--- 219,225 ----
/* Define the syntax stuff for \<, \>, etc. */
/* Sword must be nonzero for the wordchar pattern commands in re_match_2. */
! enum syntaxcode { Swhitespace = 0, Sword = 1, Ssymbol = 2 };
# ifdef SWITCH_ENUM_BUG
# define SWITCH_ENUM_CAST(x) ((int)(x))
***************
*** 399,405 ****
if (ISALNUM (c))
re_syntax_table[c] = Sword;
! re_syntax_table['_'] = Sword;
done = 1;
}
--- 399,405 ----
if (ISALNUM (c))
re_syntax_table[c] = Sword;
! re_syntax_table['_'] = Ssymbol;
done = 1;
}
***************
*** 656,661 ****
--- 656,664 ----
wordbound, /* Succeeds if at a word boundary. */
notwordbound, /* Succeeds if not at a word boundary. */
+ symbeg, /* Succeeds if at symbol beginning. */
+ symend, /* Succeeds if at symbol end. */
+
/* Matches any character whose syntax is specified. Followed by
a byte which contains a syntax code, e.g., Sword. */
syntaxspec,
***************
*** 1095,1100 ****
--- 1098,1110 ----
case wordend:
printf ("/wordend");
+ case symbeg:
+ printf ("/symbeg");
+ break;
+
+ case symend:
+ printf ("/symend");
+
case syntaxspec:
printf ("/syntaxspec");
mcnt = *p++;
***************
*** 3135,3140 ****
--- 3145,3163 ----
BUF_PUSH (wordend);
break;
+ case '_':
+ if (syntax & RE_NO_GNU_OPS)
+ goto normal_char;
+ laststart = b;
+ PATFETCH (c);
+ if (c == '<')
+ BUF_PUSH (symbeg);
+ else if (c == '>')
+ BUF_PUSH (symend);
+ else
+ FREE_STACK_RETURN (REG_BADPAT);
+ break;
+
case 'b':
if (syntax & RE_NO_GNU_OPS)
goto normal_char;
***************
*** 3629,3634 ****
--- 3652,3659 ----
case notwordbound:
case wordbeg:
case wordend:
+ case symbeg:
+ case symend:
continue;
***************
*** 4396,4409 ****
break;
case wordend:
! case notsyntaxspec:
return ((re_opcode_t) *p1 == syntaxspec
! && p1[1] == (op2 == wordend ? Sword : p2[1]));
case wordbeg:
! case syntaxspec:
return ((re_opcode_t) *p1 == notsyntaxspec
! && p1[1] == (op2 == wordbeg ? Sword : p2[1]));
case wordbound:
return (((re_opcode_t) *p1 == notsyntaxspec
--- 4421,4440 ----
break;
case wordend:
! return ((re_opcode_t) *p1 == syntaxspec && p1[1] == Sword);
! case symend:
return ((re_opcode_t) *p1 == syntaxspec
! && (p1[1] == Ssymbol || p1[1] == Sword));
! case notsyntaxspec:
! return ((re_opcode_t) *p1 == syntaxspec && p1[1] == p2[1]);
case wordbeg:
! return ((re_opcode_t) *p1 == notsyntaxspec && p1[1] == Sword);
! case symbeg:
return ((re_opcode_t) *p1 == notsyntaxspec
! && (p1[1] == Ssymbol || p1[1] == Sword));
! case syntaxspec:
! return ((re_opcode_t) *p1 == notsyntaxspec && p1[1] == p2[1]);
case wordbound:
return (((re_opcode_t) *p1 == notsyntaxspec
***************
*** 5528,5533 ****
--- 5559,5650 ----
}
break;
+ case symbeg:
+ DEBUG_PRINT1 ("EXECUTING symbeg.\n");
+
+ /* We FAIL in one of the following cases: */
+
+ /* Case 1: D is at the end of string. */
+ if (AT_STRINGS_END (d))
+ goto fail;
+ else
+ {
+ /* C1 is the character before D, S1 is the syntax of C1, C2
+ is the character at D, and S2 is the syntax of C2. */
+ re_wchar_t c1, c2;
+ int s1, s2;
+ #ifdef emacs
+ int offset = PTR_TO_OFFSET (d);
+ int charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+ UPDATE_SYNTAX_TABLE (charpos);
+ #endif
+ PREFETCH ();
+ c2 = RE_STRING_CHAR (d, dend - d);
+ s2 = SYNTAX (c2);
+
+ /* Case 2: S2 is neither Sword nor Ssymbol. */
+ if (s2 != Sword && s2 != Ssymbol)
+ goto fail;
+
+ /* Case 3: D is not at the beginning of string ... */
+ if (!AT_STRINGS_BEG (d))
+ {
+ GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
+ #ifdef emacs
+ UPDATE_SYNTAX_TABLE_BACKWARD (charpos - 1);
+ #endif
+ s1 = SYNTAX (c1);
+
+ /* ... and S1 is Sword or Ssymbol. */
+ if (s1 == Sword || s1 == Ssymbol)
+ goto fail;
+ }
+ }
+ break;
+
+ case symend:
+ DEBUG_PRINT1 ("EXECUTING symend.\n");
+
+ /* We FAIL in one of the following cases: */
+
+ /* Case 1: D is at the beginning of string. */
+ if (AT_STRINGS_BEG (d))
+ goto fail;
+ else
+ {
+ /* C1 is the character before D, S1 is the syntax of C1, C2
+ is the character at D, and S2 is the syntax of C2. */
+ re_wchar_t c1, c2;
+ int s1, s2;
+ #ifdef emacs
+ int offset = PTR_TO_OFFSET (d) - 1;
+ int charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+ UPDATE_SYNTAX_TABLE (charpos);
+ #endif
+ GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
+ s1 = SYNTAX (c1);
+
+ /* Case 2: S1 is neither Ssymbol nor Sword. */
+ if (s1 != Sword && s1 != Ssymbol)
+ goto fail;
+
+ /* Case 3: D is not at the end of string ... */
+ if (!AT_STRINGS_END (d))
+ {
+ PREFETCH_NOLIMIT ();
+ c2 = RE_STRING_CHAR (d, dend - d);
+ #ifdef emacs
+ UPDATE_SYNTAX_TABLE_FORWARD (charpos);
+ #endif
+ s2 = SYNTAX (c2);
+
+ /* ... and S2 is Sword or Ssymbol. */
+ if (s2 == Sword || s2 == Ssymbol)
+ goto fail;
+ }
+ }
+ break;
+
case syntaxspec:
case notsyntaxspec:
not = (re_opcode_t) *(p - 1) == notsyntaxspec;
*** src/search.c.~1~ 2002-05-12 19:04:16.000000000 -0500
--- src/search.c 2004-04-29 17:30:17.000000000 -0500
***************
*** 962,968 ****
{
case '|': case '(': case ')': case '`': case '\'': case 'b':
case 'B': case '<': case '>': case 'w': case 'W': case 's':
! case 'S': case '=': case '{': case '}':
case 'c': case 'C': /* for categoryspec and notcategoryspec */
case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
--- 962,968 ----
{
case '|': case '(': case ')': case '`': case '\'': case 'b':
case 'B': case '<': case '>': case 'w': case 'W': case 's':
! case 'S': case '=': case '{': case '}': case '_':
case 'c': case 'C': /* for categoryspec and notcategoryspec */
case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
*** man/search.texi.~1~ 2002-07-06 08:44:06.000000000 -0500
--- man/search.texi 2004-04-29 17:38:41.000000000 -0500
***************
*** 672,677 ****
--- 672,689 ----
@item \W
matches any character that is not a word-constituent.
+ @item \_<
+ matches the empty string, but only at the beginning of a symbol. A
+ symbol is a sequence of one or more word or symbol constituent
+ characters. @samp{\_<} matches at the beginning of the buffer only if
+ a symbol-constituent character follows.
+
+ @item \_>
+ matches the empty string, but only at the end of a symbol. A symbol
+ is a sequence of one or more word or symbol constituent characters.
+ @samp{\_>} matches at the end of the buffer only if the contents end
+ with a symbol-constituent character.
+
@item \s@var{c}
matches any character whose syntax is @var{c}. Here @var{c} is a
character that designates a particular syntax class: thus, @samp{w}
*** searching.texi.~1.48.~ 2004-02-16 20:09:15.000000000 -0500
--- searching.texi 2004-05-05 01:12:38.000000000 -0500
***************
*** 666,671 ****
--- 666,686 ----
with a word-constituent character.
@end table
+ @item \_<
+ @cindex @samp{\_<} in regexp
+ @cindex symbols, matching in regexp
+ matches the empty string, but only at the beginning of a symbol. A
+ symbol is a sequence of one or more word or symbol constituent
+ characters. @samp{\_<} matches at the beginning of the buffer (or
+ string) only if a symbol-constituent character follows.
+
+ @item \_>
+ @cindex @samp{\_>} in regexp
+ matches the empty string, but only at the end of a symbol. A symbol
+ is a sequence of one or more word or symbol constituent characters.
+ @samp{\_>} matches at the end of the buffer (or string) only if the
+ contents end with a symbol-constituent character.
+
@kindex invalid-regexp
Not every string is a valid regular expression. For example, a string
with unbalanced square brackets is invalid (with a few exceptions, such
*** etc/NEWS.~1.950.~ 2004-04-27 17:02:27.000000000 -0500
--- etc/NEWS 2004-05-04 14:15:33.000000000 -0500
***************
*** 90,95 ****
--- 90,101 ----
\f
* Changes in Emacs 21.4
+ +++
+ ** There are now two new regular expression operators, \_< and \_>,
+ for matching the beginning and end of a symbol. A symbol is a
+ non-empty sequence of either word or symbol constituent characters, as
+ specified by the syntax table.
+
---
** The IELM prompt is now, by default, read-only. This can be
controlled with the new user option `ielm-prompt-read-only'.
[-- Attachment #3: Type: text/plain, Size: 141 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-05-04 19:17 ` Jim Blandy
@ 2004-05-05 6:20 ` Eli Zaretskii
2004-05-05 6:14 ` Jim Blandy
0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2004-05-05 6:20 UTC (permalink / raw)
Cc: emacs-devel
> From: Jim Blandy <jimb@redhat.com>
> Date: 04 May 2004 14:17:44 -0500
>
> + @item \_<
> + @cindex @samp{\_<} in regexp
IMHO, an additional index entry here, something like
@cindex matching symbols in regexp
would be useful.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-05-05 6:14 ` Jim Blandy
@ 2004-05-05 20:21 ` Richard Stallman
2004-05-05 21:10 ` Stefan Monnier
2004-05-12 17:36 ` Jim Blandy
1 sibling, 1 reply; 14+ messages in thread
From: Richard Stallman @ 2004-05-05 20:21 UTC (permalink / raw)
Cc: eliz, emacs-devel
It looks good to me. But it would be good for some people to test
this and report before we install it.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-05-05 20:21 ` Richard Stallman
@ 2004-05-05 21:10 ` Stefan Monnier
0 siblings, 0 replies; 14+ messages in thread
From: Stefan Monnier @ 2004-05-05 21:10 UTC (permalink / raw)
Cc: Jim Blandy, eliz, emacs-devel
> It looks good to me. But it would be good for some people to test
> this and report before we install it.
Looks fine to me too. Can't test it just now, tho.
Stefan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-05-05 6:14 ` Jim Blandy
2004-05-05 20:21 ` Richard Stallman
@ 2004-05-12 17:36 ` Jim Blandy
1 sibling, 0 replies; 14+ messages in thread
From: Jim Blandy @ 2004-05-12 17:36 UTC (permalink / raw)
Has anyone had a chance to try this patch out?
Jim Blandy <jimb@redhat.com> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > > From: Jim Blandy <jimb@redhat.com>
> > > Date: 04 May 2004 14:17:44 -0500
> > >
> > > + @item \_<
> > > + @cindex @samp{\_<} in regexp
> >
> > IMHO, an additional index entry here, something like
> >
> > @cindex matching symbols in regexp
> >
> > would be useful.
>
> Okay, I added:
>
> + @cindex symbols, matching in regexp
>
>
> src/ChangeLog:
> 2004-04-29 Jim Blandy <jimb@redhat.com>
>
> Add support for new '\_<' and '\_>' regexp operators, matching the
> beginning and ends of symbols.
> * regex.c (enum syntaxcode): Add Ssymbol.
> (init_syntax_once): Set the syntax for '_' to Ssymbol, not Sword.
> (symbeg, symend): New opcodes.
> (print_partial_compiled_pattern): Print the new opcodes properly.
> (regex_compile): Parse the new operators.
> (analyze_first): symbeg and symend match only the empty string.
> (mutually_exclusive_p): symend is mutually exclusive with \s_ and
> \sw; symbeg is mutually exclusive with \S_ and \Sw.
> (re_match_2_internal): Add code for symbeg and symend.
> * search.c (trivial_regexp_p): \_ is no longer a trivial regexp.
>
> man/ChangeLog:
> 2004-04-29 Jim Blandy <jimb@redhat.com>
>
> * search.texi (Regexps): Document the \_< and \_> regexp operators.
>
> lispref/ChangeLog:
> 2004-05-04 Jim Blandy <jimb@redhat.com>
>
> * searching.texi (Regexp Backslash): Document new \_< and \_>
> operators.
>
> *** src/regex.c.~2~ 2004-04-29 15:56:53.000000000 -0500
> --- src/regex.c 2004-04-29 17:44:24.000000000 -0500
> ***************
> *** 219,225 ****
> /* Define the syntax stuff for \<, \>, etc. */
>
> /* Sword must be nonzero for the wordchar pattern commands in re_match_2. */
> ! enum syntaxcode { Swhitespace = 0, Sword = 1 };
>
> # ifdef SWITCH_ENUM_BUG
> # define SWITCH_ENUM_CAST(x) ((int)(x))
> --- 219,225 ----
> /* Define the syntax stuff for \<, \>, etc. */
>
> /* Sword must be nonzero for the wordchar pattern commands in re_match_2. */
> ! enum syntaxcode { Swhitespace = 0, Sword = 1, Ssymbol = 2 };
>
> # ifdef SWITCH_ENUM_BUG
> # define SWITCH_ENUM_CAST(x) ((int)(x))
> ***************
> *** 399,405 ****
> if (ISALNUM (c))
> re_syntax_table[c] = Sword;
>
> ! re_syntax_table['_'] = Sword;
>
> done = 1;
> }
> --- 399,405 ----
> if (ISALNUM (c))
> re_syntax_table[c] = Sword;
>
> ! re_syntax_table['_'] = Ssymbol;
>
> done = 1;
> }
> ***************
> *** 656,661 ****
> --- 656,664 ----
> wordbound, /* Succeeds if at a word boundary. */
> notwordbound, /* Succeeds if not at a word boundary. */
>
> + symbeg, /* Succeeds if at symbol beginning. */
> + symend, /* Succeeds if at symbol end. */
> +
> /* Matches any character whose syntax is specified. Followed by
> a byte which contains a syntax code, e.g., Sword. */
> syntaxspec,
> ***************
> *** 1095,1100 ****
> --- 1098,1110 ----
> case wordend:
> printf ("/wordend");
>
> + case symbeg:
> + printf ("/symbeg");
> + break;
> +
> + case symend:
> + printf ("/symend");
> +
> case syntaxspec:
> printf ("/syntaxspec");
> mcnt = *p++;
> ***************
> *** 3135,3140 ****
> --- 3145,3163 ----
> BUF_PUSH (wordend);
> break;
>
> + case '_':
> + if (syntax & RE_NO_GNU_OPS)
> + goto normal_char;
> + laststart = b;
> + PATFETCH (c);
> + if (c == '<')
> + BUF_PUSH (symbeg);
> + else if (c == '>')
> + BUF_PUSH (symend);
> + else
> + FREE_STACK_RETURN (REG_BADPAT);
> + break;
> +
> case 'b':
> if (syntax & RE_NO_GNU_OPS)
> goto normal_char;
> ***************
> *** 3629,3634 ****
> --- 3652,3659 ----
> case notwordbound:
> case wordbeg:
> case wordend:
> + case symbeg:
> + case symend:
> continue;
>
>
> ***************
> *** 4396,4409 ****
> break;
>
> case wordend:
> ! case notsyntaxspec:
> return ((re_opcode_t) *p1 == syntaxspec
> ! && p1[1] == (op2 == wordend ? Sword : p2[1]));
>
> case wordbeg:
> ! case syntaxspec:
> return ((re_opcode_t) *p1 == notsyntaxspec
> ! && p1[1] == (op2 == wordbeg ? Sword : p2[1]));
>
> case wordbound:
> return (((re_opcode_t) *p1 == notsyntaxspec
> --- 4421,4440 ----
> break;
>
> case wordend:
> ! return ((re_opcode_t) *p1 == syntaxspec && p1[1] == Sword);
> ! case symend:
> return ((re_opcode_t) *p1 == syntaxspec
> ! && (p1[1] == Ssymbol || p1[1] == Sword));
> ! case notsyntaxspec:
> ! return ((re_opcode_t) *p1 == syntaxspec && p1[1] == p2[1]);
>
> case wordbeg:
> ! return ((re_opcode_t) *p1 == notsyntaxspec && p1[1] == Sword);
> ! case symbeg:
> return ((re_opcode_t) *p1 == notsyntaxspec
> ! && (p1[1] == Ssymbol || p1[1] == Sword));
> ! case syntaxspec:
> ! return ((re_opcode_t) *p1 == notsyntaxspec && p1[1] == p2[1]);
>
> case wordbound:
> return (((re_opcode_t) *p1 == notsyntaxspec
> ***************
> *** 5528,5533 ****
> --- 5559,5650 ----
> }
> break;
>
> + case symbeg:
> + DEBUG_PRINT1 ("EXECUTING symbeg.\n");
> +
> + /* We FAIL in one of the following cases: */
> +
> + /* Case 1: D is at the end of string. */
> + if (AT_STRINGS_END (d))
> + goto fail;
> + else
> + {
> + /* C1 is the character before D, S1 is the syntax of C1, C2
> + is the character at D, and S2 is the syntax of C2. */
> + re_wchar_t c1, c2;
> + int s1, s2;
> + #ifdef emacs
> + int offset = PTR_TO_OFFSET (d);
> + int charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
> + UPDATE_SYNTAX_TABLE (charpos);
> + #endif
> + PREFETCH ();
> + c2 = RE_STRING_CHAR (d, dend - d);
> + s2 = SYNTAX (c2);
> +
> + /* Case 2: S2 is neither Sword nor Ssymbol. */
> + if (s2 != Sword && s2 != Ssymbol)
> + goto fail;
> +
> + /* Case 3: D is not at the beginning of string ... */
> + if (!AT_STRINGS_BEG (d))
> + {
> + GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
> + #ifdef emacs
> + UPDATE_SYNTAX_TABLE_BACKWARD (charpos - 1);
> + #endif
> + s1 = SYNTAX (c1);
> +
> + /* ... and S1 is Sword or Ssymbol. */
> + if (s1 == Sword || s1 == Ssymbol)
> + goto fail;
> + }
> + }
> + break;
> +
> + case symend:
> + DEBUG_PRINT1 ("EXECUTING symend.\n");
> +
> + /* We FAIL in one of the following cases: */
> +
> + /* Case 1: D is at the beginning of string. */
> + if (AT_STRINGS_BEG (d))
> + goto fail;
> + else
> + {
> + /* C1 is the character before D, S1 is the syntax of C1, C2
> + is the character at D, and S2 is the syntax of C2. */
> + re_wchar_t c1, c2;
> + int s1, s2;
> + #ifdef emacs
> + int offset = PTR_TO_OFFSET (d) - 1;
> + int charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
> + UPDATE_SYNTAX_TABLE (charpos);
> + #endif
> + GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
> + s1 = SYNTAX (c1);
> +
> + /* Case 2: S1 is neither Ssymbol nor Sword. */
> + if (s1 != Sword && s1 != Ssymbol)
> + goto fail;
> +
> + /* Case 3: D is not at the end of string ... */
> + if (!AT_STRINGS_END (d))
> + {
> + PREFETCH_NOLIMIT ();
> + c2 = RE_STRING_CHAR (d, dend - d);
> + #ifdef emacs
> + UPDATE_SYNTAX_TABLE_FORWARD (charpos);
> + #endif
> + s2 = SYNTAX (c2);
> +
> + /* ... and S2 is Sword or Ssymbol. */
> + if (s2 == Sword || s2 == Ssymbol)
> + goto fail;
> + }
> + }
> + break;
> +
> case syntaxspec:
> case notsyntaxspec:
> not = (re_opcode_t) *(p - 1) == notsyntaxspec;
> *** src/search.c.~1~ 2002-05-12 19:04:16.000000000 -0500
> --- src/search.c 2004-04-29 17:30:17.000000000 -0500
> ***************
> *** 962,968 ****
> {
> case '|': case '(': case ')': case '`': case '\'': case 'b':
> case 'B': case '<': case '>': case 'w': case 'W': case 's':
> ! case 'S': case '=': case '{': case '}':
> case 'c': case 'C': /* for categoryspec and notcategoryspec */
> case '1': case '2': case '3': case '4': case '5':
> case '6': case '7': case '8': case '9':
> --- 962,968 ----
> {
> case '|': case '(': case ')': case '`': case '\'': case 'b':
> case 'B': case '<': case '>': case 'w': case 'W': case 's':
> ! case 'S': case '=': case '{': case '}': case '_':
> case 'c': case 'C': /* for categoryspec and notcategoryspec */
> case '1': case '2': case '3': case '4': case '5':
> case '6': case '7': case '8': case '9':
> *** man/search.texi.~1~ 2002-07-06 08:44:06.000000000 -0500
> --- man/search.texi 2004-04-29 17:38:41.000000000 -0500
> ***************
> *** 672,677 ****
> --- 672,689 ----
> @item \W
> matches any character that is not a word-constituent.
>
> + @item \_<
> + matches the empty string, but only at the beginning of a symbol. A
> + symbol is a sequence of one or more word or symbol constituent
> + characters. @samp{\_<} matches at the beginning of the buffer only if
> + a symbol-constituent character follows.
> +
> + @item \_>
> + matches the empty string, but only at the end of a symbol. A symbol
> + is a sequence of one or more word or symbol constituent characters.
> + @samp{\_>} matches at the end of the buffer only if the contents end
> + with a symbol-constituent character.
> +
> @item \s@var{c}
> matches any character whose syntax is @var{c}. Here @var{c} is a
> character that designates a particular syntax class: thus, @samp{w}
> *** searching.texi.~1.48.~ 2004-02-16 20:09:15.000000000 -0500
> --- searching.texi 2004-05-05 01:12:38.000000000 -0500
> ***************
> *** 666,671 ****
> --- 666,686 ----
> with a word-constituent character.
> @end table
>
> + @item \_<
> + @cindex @samp{\_<} in regexp
> + @cindex symbols, matching in regexp
> + matches the empty string, but only at the beginning of a symbol. A
> + symbol is a sequence of one or more word or symbol constituent
> + characters. @samp{\_<} matches at the beginning of the buffer (or
> + string) only if a symbol-constituent character follows.
> +
> + @item \_>
> + @cindex @samp{\_>} in regexp
> + matches the empty string, but only at the end of a symbol. A symbol
> + is a sequence of one or more word or symbol constituent characters.
> + @samp{\_>} matches at the end of the buffer (or string) only if the
> + contents end with a symbol-constituent character.
> +
> @kindex invalid-regexp
> Not every string is a valid regular expression. For example, a string
> with unbalanced square brackets is invalid (with a few exceptions, such
> *** etc/NEWS.~1.950.~ 2004-04-27 17:02:27.000000000 -0500
> --- etc/NEWS 2004-05-04 14:15:33.000000000 -0500
> ***************
> *** 90,95 ****
> --- 90,101 ----
> \f
> * Changes in Emacs 21.4
>
> + +++
> + ** There are now two new regular expression operators, \_< and \_>,
> + for matching the beginning and end of a symbol. A symbol is a
> + non-empty sequence of either word or symbol constituent characters, as
> + specified by the syntax table.
> +
> ---
> ** The IELM prompt is now, by default, read-only. This can be
> controlled with the new user option `ielm-prompt-read-only'.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-04-29 22:57 Implement new symbol-start and symbol-end regexp operators Jim Blandy
2004-05-01 9:44 ` Richard Stallman
@ 2004-05-19 16:38 ` Stefan Monnier
2004-06-10 0:32 ` Luc Teirlinck
1 sibling, 1 reply; 14+ messages in thread
From: Stefan Monnier @ 2004-05-19 16:38 UTC (permalink / raw)
Cc: emacs-devel
> I find these really useful --- finally I can isearch-regexp for
> identifiers in code and match exactly what I mean. Synthesizing these
> operators from the existing ones is a pain, partly because "symbol
> constituents" are those characters whose syntax is *either* 'word' or
> 'symbol'.
Installed,
Stefan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-05-19 16:38 ` Stefan Monnier
@ 2004-06-10 0:32 ` Luc Teirlinck
2004-07-16 2:38 ` Luc Teirlinck
0 siblings, 1 reply; 14+ messages in thread
From: Luc Teirlinck @ 2004-06-10 0:32 UTC (permalink / raw)
Cc: jimb, emacs-devel
Stefan Monnier wrote:
> I find these really useful --- finally I can isearch-regexp for
> identifiers in code and match exactly what I mean. Synthesizing these
> operators from the existing ones is a pain, partly because "symbol
> constituents" are those characters whose syntax is *either* 'word' or
> 'symbol'.
Installed,
You seem to have installed Jim's changes to search.c and regex.c, but
not to searching.texi, nor, in as far as I can tell, to the NEWS. As a
result, the changes are, in as far as I can tell, completely undocumented.
Is there a reason for that?
Sincerely,
Luc.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-06-10 0:32 ` Luc Teirlinck
@ 2004-07-16 2:38 ` Luc Teirlinck
2004-07-16 2:45 ` Luc Teirlinck
2004-07-16 12:43 ` Stefan
0 siblings, 2 replies; 14+ messages in thread
From: Luc Teirlinck @ 2004-07-16 2:38 UTC (permalink / raw)
Cc: jimb, monnier, emacs-devel
I asked this before, but apparently the documentation for this has not
been updated, as far as I can tell:
Stefan Monnier wrote:
> I find these really useful --- finally I can isearch-regexp for
> identifiers in code and match exactly what I mean. Synthesizing these
> operators from the existing ones is a pain, partly because "symbol
> constituents" are those characters whose syntax is *either* 'word' or
> 'symbol'.
Installed,
You seem to have installed Jim's changes to search.c and regex.c, but
not to searching.texi, nor, in as far as I can tell, to the NEWS. As a
result, the changes are, in as far as I can tell, completely undocumented.
Is there a reason for that?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-07-16 2:38 ` Luc Teirlinck
@ 2004-07-16 2:45 ` Luc Teirlinck
2004-07-16 12:43 ` Stefan
1 sibling, 0 replies; 14+ messages in thread
From: Luc Teirlinck @ 2004-07-16 2:45 UTC (permalink / raw)
Cc: jimb, monnier, emacs-devel
I meant to address the previous message to Stefan, but accidentally
put myself in the "To" field.
Sincerely,
Luc.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-07-16 2:38 ` Luc Teirlinck
2004-07-16 2:45 ` Luc Teirlinck
@ 2004-07-16 12:43 ` Stefan
2004-07-16 17:49 ` Luc Teirlinck
1 sibling, 1 reply; 14+ messages in thread
From: Stefan @ 2004-07-16 12:43 UTC (permalink / raw)
Cc: jimb, emacs-devel
> You seem to have installed Jim's changes to search.c and regex.c, but
> not to searching.texi, nor, in as far as I can tell, to the NEWS. As a
> result, the changes are, in as far as I can tell, completely undocumented.
> Is there a reason for that?
No,
Stefan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Implement new symbol-start and symbol-end regexp operators
2004-07-16 12:43 ` Stefan
@ 2004-07-16 17:49 ` Luc Teirlinck
0 siblings, 0 replies; 14+ messages in thread
From: Luc Teirlinck @ 2004-07-16 17:49 UTC (permalink / raw)
Cc: jimb, emacs-devel
I will check the changes and then commit them. I first wanted to know
whether you did not leave them out deliberately. There actually is a
problem (`@end table' is in the wrong position), but it seems trivial
to correct.
Sincerely,
Luc.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2004-07-16 17:49 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-29 22:57 Implement new symbol-start and symbol-end regexp operators Jim Blandy
2004-05-01 9:44 ` Richard Stallman
2004-05-04 19:17 ` Jim Blandy
2004-05-05 6:20 ` Eli Zaretskii
2004-05-05 6:14 ` Jim Blandy
2004-05-05 20:21 ` Richard Stallman
2004-05-05 21:10 ` Stefan Monnier
2004-05-12 17:36 ` Jim Blandy
2004-05-19 16:38 ` Stefan Monnier
2004-06-10 0:32 ` Luc Teirlinck
2004-07-16 2:38 ` Luc Teirlinck
2004-07-16 2:45 ` Luc Teirlinck
2004-07-16 12:43 ` Stefan
2004-07-16 17:49 ` Luc Teirlinck
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.