* bug#5393: control message for bug 5393
[not found] <E1RpP6b-00077v-6Q@fencepost.gnu.org>
@ 2016-02-28 6:23 ` Lars Ingebrigtsen
2016-02-28 10:25 ` Stephen Berman
2016-02-28 18:22 ` Glenn Morris
0 siblings, 2 replies; 5+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-28 6:23 UTC (permalink / raw)
To: Glenn Morris; +Cc: 5393
Glenn Morris <rgm@gnu.org> writes:
> forwarded 5393 http://lists.gnu.org/archive/html/emacs-devel/2012-01/msg00732.html
I was unable to find the patch in question on that link... Anybody?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#5393: control message for bug 5393
2016-02-28 6:23 ` bug#5393: control message for bug 5393 Lars Ingebrigtsen
@ 2016-02-28 10:25 ` Stephen Berman
2016-02-29 2:30 ` Lars Ingebrigtsen
2016-02-28 18:22 ` Glenn Morris
1 sibling, 1 reply; 5+ messages in thread
From: Stephen Berman @ 2016-02-28 10:25 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 5393
On Sun, 28 Feb 2016 17:23:19 +1100 Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Glenn Morris <rgm@gnu.org> writes:
>
>> forwarded 5393 http://lists.gnu.org/archive/html/emacs-devel/2012-01/msg00732.html
>
> I was unable to find the patch in question on that link... Anybody?
I think it's here:
http://lists.gnu.org/archive/html/emacs-devel/2009-06/msg00094.html
Steve Berman
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#5393: control message for bug 5393
2016-02-28 6:23 ` bug#5393: control message for bug 5393 Lars Ingebrigtsen
2016-02-28 10:25 ` Stephen Berman
@ 2016-02-28 18:22 ` Glenn Morris
1 sibling, 0 replies; 5+ messages in thread
From: Glenn Morris @ 2016-02-28 18:22 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 5393
Lars Ingebrigtsen wrote:
> Glenn Morris <rgm@gnu.org> writes:
>
>> forwarded 5393
>> http://lists.gnu.org/archive/html/emacs-devel/2012-01/msg00732.html
>
> I was unable to find the patch in question on that link... Anybody?
I just use "forwarded" to link to a place with more recent discussion.
The patch is in the original bug report, the only message in the report.
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#5393: control message for bug 5393
2016-02-28 10:25 ` Stephen Berman
@ 2016-02-29 2:30 ` Lars Ingebrigtsen
2019-06-27 17:41 ` Lars Ingebrigtsen
0 siblings, 1 reply; 5+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-29 2:30 UTC (permalink / raw)
To: Stephen Berman; +Cc: 5393
Stephen Berman <stephen.berman@gmx.net> writes:
> I think it's here:
>
> http://lists.gnu.org/archive/html/emacs-devel/2009-06/msg00094.html
I have included the patch below for easier reading.
Lookahead and lookbehind are useful, I guess, but the regex code in
Emacs is a mystery to me, so I can't really judge the patch.
Index: src/regex.c
===================================================================
RCS file: /sources/emacs/emacs/src/regex.c,v
retrieving revision 1.236
diff -u -r1.236 regex.c
--- src/regex.c 8 Jan 2009 03:15:54 -0000 1.236
+++ src/regex.c 3 Jun 2009 22:35:17 -0000
@@ -735,7 +735,14 @@
syntaxspec,
/* Matches any character whose syntax is not that specified. */
- notsyntaxspec
+ notsyntaxspec,
+
+ lookahead,
+ lookahead_not,
+ lookbehind,
+ lookbehind_not,
+ lookaround_succeed,
+ lookaround_fail
#ifdef emacs
,before_dot, /* Succeeds if before point. */
@@ -1033,6 +1040,36 @@
fprintf (stderr, "/stop_memory/%d", *p++);
break;
+ case lookahead:
+ extract_number_and_incr (&mcnt, &p);
+ fprintf (stderr, "/lookahead/%d", mcnt);
+ break;
+
+ case lookahead_not:
+ extract_number_and_incr (&mcnt, &p);
+ fprintf (stderr, "/lookahead_not/%d", mcnt);
+ break;
+
+ case lookbehind:
+ extract_number_and_incr (&mcnt, &p);
+ extract_number_and_incr (&mcnt2, &p);
+ fprintf (stderr, "/lookbehind/%d/%d", mcnt, mcnt2);
+ break;
+
+ case lookbehind_not:
+ extract_number_and_incr (&mcnt, &p);
+ extract_number_and_incr (&mcnt2, &p);
+ fprintf (stderr, "/lookbehind_not/%d/%d", mcnt, mcnt2);
+ break;
+
+ case lookaround_succeed:
+ fprintf (stderr, "/lookaround_succeed");
+ break;
+
+ case lookaround_fail:
+ fprintf (stderr, "/lookaround_fail");
+ break;
+
case duplicate:
fprintf (stderr, "/duplicate/%d", *p++);
break;
@@ -1600,11 +1637,17 @@
} \
else \
{ \
- regend[reg] = POP_FAILURE_POINTER (); \
- regstart[reg] = POP_FAILURE_POINTER (); \
- DEBUG_PRINT4 (" Pop reg %d (spanning %p -> %p)\n", \
- reg, regstart[reg], regend[reg]); \
- } \
+ re_char *start, *end; \
+ end = POP_FAILURE_POINTER (); \
+ start = POP_FAILURE_POINTER (); \
+ if (!discard_saved_regs) \
+ { \
+ regstart[reg] = start; \
+ regend[reg] = end; \
+ DEBUG_PRINT4 (" Pop reg %d (spanning %p -> %p)\n", \
+ reg, regstart[reg], regend[reg]); \
+ } \
+ } \
} while (0)
/* Check that we are not stuck in an infinite loop. */
@@ -1702,7 +1745,7 @@
while (fail_stack.frame < fail_stack.avail) \
POP_FAILURE_REG_OR_COUNT (); \
\
- pat = POP_FAILURE_POINTER (); \
+ pat = POP_FAILURE_POINTER (); \
DEBUG_PRINT2 (" Popping pattern %p: ", pat); \
DEBUG_PRINT_COMPILED_PATTERN (bufp, pat, pend); \
\
@@ -1724,6 +1767,29 @@
} while (0) /* POP_FAILURE_POINT */
+#define FINISH_LOOKAROUND() \
+ do { \
+ re_char *str, *pat; \
+ re_opcode_t op; \
+ discard_saved_regs = 1; \
+ while (!FAIL_STACK_EMPTY ()) \
+ { \
+ POP_FAILURE_POINT (str, pat); \
+ op = (re_opcode_t) *pat; \
+ if (op == lookahead \
+ || op == lookahead_not \
+ || op == lookbehind \
+ || op == lookbehind_not) \
+ { \
+ d = str; \
+ dend = ((d >= string1 && d <= end1) \
+ ? end_match_1 : end_match_2); \
+ break; \
+ } \
+ } \
+ discard_saved_regs = 0; \
+ } while (0);
+
\f
/* Registers are set to a sentinel when they haven't yet matched. */
#define REG_UNSET(e) ((e) == NULL)
@@ -1922,6 +1988,7 @@
pattern_offset_t fixup_alt_jump;
pattern_offset_t laststart_offset;
regnum_t regnum;
+ int lookaround;
} compile_stack_elt_t;
@@ -2522,6 +2589,8 @@
compile_stack,
regnum_t regnum));
+static int exact_chars_in_pattern_buffer _RE_ARGS ((struct re_pattern_buffer *bufp, re_char *p, re_char *pend));
+
/* `regex_compile' compiles PATTERN (of length SIZE) according to SYNTAX.
Returns one of error codes defined in `regex.h', or zero for success.
@@ -3261,6 +3330,7 @@
handle_open:
{
int shy = 0;
+ int lookaround = 0;
regnum_t regnum = 0;
if (p+1 < pend)
{
@@ -3282,6 +3352,27 @@
case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
regnum = 10*regnum + (c - '0'); break;
+ case '=':
+ /* Positive lookahead assertion. */
+ shy = lookaround = 1;
+ break;
+ case '!':
+ /* Negative lookahead assertion. */
+ shy = lookaround = 2;
+ break;
+ case '<':
+ {
+ PATFETCH (c);
+ if (c == '=')
+ /* Positive lookbehind assertion. */
+ shy = lookaround = -1;
+ else if (c == '!')
+ /* Negative lookbehind assertion. */
+ shy = lookaround = -2;
+ else
+ FREE_STACK_RETURN (REG_BADPAT);
+ }
+ break;
default:
/* Only (?:...) is supported right now. */
FREE_STACK_RETURN (REG_BADPAT);
@@ -3328,6 +3419,7 @@
= fixup_alt_jump ? fixup_alt_jump - bufp->buffer + 1 : 0;
COMPILE_STACK_TOP.laststart_offset = b - bufp->buffer;
COMPILE_STACK_TOP.regnum = regnum;
+ COMPILE_STACK_TOP.lookaround = lookaround;
/* Do not push a start_memory for groups beyond the last one
we can represent in the compiled pattern. */
@@ -3377,6 +3469,7 @@
later groups should continue to be numbered higher,
as in `(ab)c(de)' -- the second group is #2. */
regnum_t regnum;
+ int lookaround;
compile_stack.avail--;
begalt = bufp->buffer + COMPILE_STACK_TOP.begalt_offset;
@@ -3389,13 +3482,40 @@
/* If we've reached MAX_REGNUM groups, then this open
won't actually generate any code, so we'll have to
clear pending_exact explicitly. */
+ lookaround = COMPILE_STACK_TOP.lookaround;
pending_exact = 0;
/* We're at the end of the group, so now we know how many
groups were inside this one. */
if (regnum <= MAX_REGNUM && regnum > 0)
BUF_PUSH_2 (stop_memory, regnum);
- }
+ else if (lookaround)
+ {
+ if (lookaround > 0)
+ {
+ /* Positive/negative lookahead assertion. */
+ GET_BUFFER_SPACE (3);
+ INSERT_JUMP (lookaround == 1 ? lookahead : lookahead_not, laststart, b + 4);
+ b += 3;
+ }
+ else
+ {
+ /* Positive/negative lookbehind assertion. */
+ int count = exact_chars_in_pattern_buffer (bufp, laststart, b);
+ if (count == -1) /* variable length */
+ FREE_STACK_RETURN (REG_BADPAT);
+
+ GET_BUFFER_SPACE (5);
+ INSERT_JUMP2 (lookaround == -1 ? lookbehind : lookbehind_not, laststart, b + 6, count);
+ b += 5;
+ }
+
+ /* Negative form. */
+ if (lookaround > 1 || lookaround < -1)
+ BUF_PUSH (lookaround_fail);
+ BUF_PUSH (lookaround_succeed);
+ }
+ }
break;
@@ -3949,10 +4069,16 @@
/* After an alternative? */
|| (*prev == '|' && (syntax & RE_NO_BK_VBAR || prev_prev_backslash))
/* After a shy subexpression? */
- || ((syntax & RE_SHY_GROUPS) && prev - 2 >= pattern
- && prev[-1] == '?' && prev[-2] == '('
- && (syntax & RE_NO_BK_PARENS
- || (prev - 3 >= pattern && prev[-3] == '\\')));
+ || ((syntax & RE_SHY_GROUPS)
+ && ((prev - 2 >= pattern
+ && prev[-1] == '?' && prev[-2] == '('
+ && (syntax & RE_NO_BK_PARENS
+ || (prev - 3 >= pattern && prev[-3] == '\\')))
+ || (prev - 3 >= pattern
+ && (*prev == '=' || *prev == '!')
+ && prev[-1] == '<' && prev[-2] == '?' && prev[-3] == '('
+ && (syntax & RE_NO_BK_PARENS
+ || (prev - 4 >= pattern && prev[-4] == '\\')))));
}
@@ -4198,6 +4324,13 @@
}
break;
+ case lookahead:
+ case lookahead_not:
+ case lookbehind:
+ case lookbehind_not:
+ if (!fastmap) break;
+ return -1;
+
/* All cases after this match the empty string. These end with
`continue'. */
@@ -4827,7 +4960,7 @@
{
case start_memory:
case stop_memory:
- p += 2; break;
+ p += 2; break;
case no_op:
p += 1; break;
case jump:
@@ -4843,6 +4976,93 @@
return p;
}
+static int
+exact_chars_in_pattern_buffer (bufp, p, pend)
+ struct re_pattern_buffer *bufp;
+ re_char *p, *pend;
+{
+ int count = 0;
+ while (p < pend)
+ {
+ switch (SWITCH_ENUM_CAST ((re_opcode_t) *p++))
+ {
+ case exactn:
+ {
+ int mcnt = *p++;
+ int buf_charlen;
+ while (mcnt > 0) {
+ STRING_CHAR_AND_LENGTH (p, p - pend, buf_charlen);
+ p += buf_charlen;
+ mcnt -= buf_charlen;
+ count++;
+ }
+ }
+ break;
+ case start_memory:
+ case stop_memory:
+ p++;
+ break;
+#ifdef emacs
+ case categoryspec:
+ case notcategoryspec:
+#endif /* emacs */
+ case syntaxspec:
+ case notsyntaxspec:
+ p++;
+ case anychar:
+ count++;
+ break;
+
+ case charset:
+ case charset_not:
+ if (CHARSET_RANGE_TABLE_EXISTS_P (p - 1))
+ {
+ int mcnt;
+ p = CHARSET_RANGE_TABLE (p - 1);
+ EXTRACT_NUMBER_AND_INCR (mcnt, p);
+ p = CHARSET_RANGE_TABLE_END (p, mcnt);
+ }
+ else
+ p += 1 + CHARSET_BITMAP_SIZE (p - 1);
+ count++;
+ break;
+
+#ifdef emacs
+ case before_dot:
+ case at_dot:
+ case after_dot:
+#endif /* emacs */
+ case no_op:
+ case begline:
+ case endline:
+ case begbuf:
+ case endbuf:
+ case wordbound:
+ case notwordbound:
+ case wordbeg:
+ case wordend:
+ case symbeg:
+ case symend:
+ /* Zero width. */
+ continue;
+ case lookahead:
+ case lookahead_not:
+ case lookbehind:
+ case lookbehind_not:
+ /* Skip to lookaround_success. */
+ while (p < pend)
+ {
+ if ((re_opcode_t) *p++ == lookaround_succeed)
+ break;
+ }
+ break;
+ default:
+ return -1;
+ }
+ }
+ return count;
+}
+
/* Non-zero if "p1 matches something" implies "p2 fails". */
static int
mutually_exclusive_p (bufp, p1, p2)
@@ -5200,6 +5420,9 @@
re_char **best_regstart, **best_regend;
#endif
+ /* Discard a saved register from the stack. */
+ boolean discard_saved_regs = 0;
+
/* Logically, this is `best_regend[0]'. But we don't want to have to
allocate space for that if we're not allocating space for anything
else (see below). Also, we never need info about register 0 for
@@ -5772,6 +5995,77 @@
p += 1;
break;
+ case lookahead:
+ case lookahead_not:
+ DEBUG_PRINT1 ((re_opcode_t) *(p - 1) == lookahead ? "EXECUTING lookahead.\n" : "EXECUTING lookahead_not.\n");
+
+ p += 2;
+ PUSH_FAILURE_POINT (p - 3, d);
+ break;
+
+ case lookbehind:
+ case lookbehind_not:
+ {
+ int mcnt, count;
+ boolean not = (re_opcode_t) *(p - 1) != lookbehind;
+
+ EXTRACT_NUMBER_AND_INCR (mcnt, p);
+ EXTRACT_NUMBER_AND_INCR (count, p);
+
+ DEBUG_PRINT2 (not
+ ? "EXECUTING lookbehind_not %d.\n"
+ : "EXECUTING lookbehind %d.\n", count);
+
+ dfail = d;
+ while (d != string1 && count > 0)
+ {
+ if (d == string2)
+ {
+ if (!string1)
+ break;
+ d = end1;
+ dend = end_match_1;
+ }
+
+ if (target_multibyte)
+ {
+ re_char *dhead = (d >= string1 && d <= end1) ? string1 : string2;
+ PREV_CHAR_BOUNDARY (d, dhead);
+ }
+ else
+ d--;
+ count--;
+ }
+
+ if (count > 0)
+ {
+ if (not)
+ {
+ /* There is no enough string to match.
+ So just make it succeeded here. */
+ d = dfail;
+ p = p - 2 + mcnt;
+ break;
+ }
+ else
+ goto fail;
+ }
+
+ PUSH_FAILURE_POINT (p - 5, dfail);
+ }
+ break;
+
+ case lookaround_succeed:
+ DEBUG_PRINT1 ("EXECUTING lookaround_succeed.\n");
+
+ FINISH_LOOKAROUND();
+ break;
+
+ case lookaround_fail:
+ DEBUG_PRINT1 ("EXECUTING lookaround_fail.\n");
+
+ FINISH_LOOKAROUND();
+ goto fail;
/* \<digit> has been turned into a `duplicate' command which is
followed by the numeric value of <digit> as the register number. */
@@ -6413,12 +6707,16 @@
case on_failure_jump_loop:
case on_failure_jump:
case succeed_n:
+ case lookahead_not:
+ case lookbehind_not:
d = str;
continue_failure_jump:
EXTRACT_NUMBER_AND_INCR (mcnt, pat);
p = pat + mcnt;
break;
+ case lookahead:
+ case lookbehind:
case no_op:
/* A special frame used for nastyloops. */
goto fail;
Test cases:
;; -*-coding:utf-8-*-
(defvar test-counter 0)
(defmacro test (&rest form)
`(princ-list (format "%d ... " (setq test-counter (1+ test-counter)))
(condition-case nil
(if (progn ,@form) 'ok 'fail)
(error 'invalid))))
(defun expect-invalid (regexp)
(test (condition-case nil
(prog1 nil (string-match regexp ""))
(error t))))
(defun expect-match (regexp string &optional group-number group-string)
(test (and (string-match regexp string)
(if group-number
(equal (match-string group-number string) group-string)
t))))
(defun expect-not-match (regexp string)
(test (not (string-match regexp string))))
(expect-match "\\(?=\\)" "")
(expect-not-match "\\(?=a\\)" "")
(expect-match "a\\(?=b\\)b" "ab")
(expect-not-match "a\\(?=b\\)c" "ab")
(expect-match "\\(?=a\\)a" "a")
(expect-not-match "\\(?=b\\)a" "a")
(expect-match "\\(?=^\\)a" "a")
(expect-match "a\\(?=$\\)$" "a")
(expect-match "a\\(?=\\)$" "a")
(expect-match "a\\(?=.*c\\)b" "abc")
(expect-not-match "a\\(?=.*d\\)b" "abc")
(expect-match "a\\(?=b\\|c\\|d\\|e\\)" "ae")
(expect-not-match "a\\(?=b\\|c\\|d\\|e\\)" "af")
(expect-match "a\\(?=\\(b\\)\\)b" "ab" 1 "b")
(expect-match "a\\(\\(?=b\\)\\)" "ab" 1 "")
(expect-match "a\\(?=\\(b\\)\\)" "ab" 1 "b")
(expect-match "\\(a\\(?=\\(b\\)\\)\\2\\)\\1" "abab" 1 "ab")
(expect-not-match "\\(a\\)\\(?=\\(b\\)\\)\\1" "ab")
(expect-match "\\(a\\(?=b\\(?=c\\)\\)\\)" "abc" 1 "a")
(expect-not-match "\\(a\\(?=b\\(?=c\\)\\)\\)" "abd")
(expect-not-match "\\(?!\\)" "")
(expect-match "\\(?!a\\)" "")
(expect-not-match "a\\(?!b\\)b" "ab")
(expect-match "a\\(?!b\\)c" "ac")
(expect-not-match "\\(?!a\\)a" "a")
(expect-match "\\(?!b\\)a" "a")
(expect-match "\\(?!^\\)a" "ba")
(expect-not-match "\\(?!^\\)a" "a")
(expect-not-match "a\\(?!$\\)$" "a")
(expect-not-match "a\\(?!\\)$" "a")
(expect-not-match "a\\(?!.*c\\)b" "abc")
(expect-match "a\\(?!.*d\\)b" "abc")
(expect-not-match "a\\(?!b\\|c\\|d\\|e\\)" "ae")
(expect-match "a\\(?!b\\|c\\|d\\|e\\)" "af")
(expect-match "a\\(?!\\(b\\)\\)c" "ac")
(expect-match "a\\(\\(?!b\\)\\)" "ac")
(expect-match "a\\(?!b\\(?!c\\)\\)" "abc")
(expect-not-match "a\\(?!b\\(?=\\(c\\)\\)\\)" "abc")
(expect-not-match "a\\(?!b\\(?!c\\)\\)" "abd")
(expect-match "\\(?<=\\)" "")
(expect-not-match "\\(?<=a\\)" "")
(expect-match "\\(?<=a\\)" "a")
(expect-not-match "\\(?<=b\\)" "a")
(expect-match "\\(?<=^\\)" "")
(expect-not-match "a\\(?<=^\\)" "")
(expect-match "\\(?<=$\\)" "")
(expect-not-match "\\(?<=$\\)a" "")
(expect-match "\\(?<=a\\)b" "ab")
(expect-not-match "\\(?<=c\\)b" "ab")
(expect-match "\\(?<=\\(?<=a\\)\\)b" "ab")
(expect-not-match "\\(?<=\\(?<=b\\)\\)b" "ab")
(expect-match "\\(?<=\\(?=a\\).\\)b" "ab")
(expect-match "\\(?<=\\(a\\)\\)b\\1" "aba" 1 "a")
(expect-match "\\(?<=.\\)a" "aa")
(expect-match "\\(?<=\\(.\\)\\)a" "aa")
(expect-match "\\(?<=\\w\\)a" "aa")
(expect-not-match "\\(?<=\\w\\)a" "!a")
(expect-match "\\(?<=\\sw\\)a" "aa")
(expect-not-match "\\(?<=\\sw\\)a" "!a")
(expect-match "\\(?<=\\cg\\)a" "λa")
(expect-not-match "\\(?<=\\Cg\\)a" "λa")
(expect-match "\\(?<=[a-z]\\)" "aa")
(expect-not-match "\\(?<=[a-z]\\)a" "1a")
(expect-match "\\(?<=[^a-z]\\)" "1a")
(expect-not-match "\\(?<=[^a-z]\\)" "aa")
(expect-match "\\(?<=[:ascii:]\\)a" "aa")
(expect-match "\\(?<=\\`\\)" "")
(expect-not-match "a\\(?<=\\`\\)" "a")
(expect-match "\\(?<=\\'\\)" "")
(expect-not-match "\\(?<=\\'\\)a" "a")
(expect-not-match "\\(?<=\\=\\)" "")
(expect-match "\\(?<=\\b\\)a" "a")
(expect-not-match "a\\(?<=\\b\\)b" "ab")
(expect-match "\\(?<=\\B\\)a" "aa")
(expect-not-match "\\(?<=\\B\\)a" " a")
(expect-match "\\(?<=\\<\\)a" "a")
(expect-not-match "a\\(?<=\\<\\)b" "ab")
(expect-match "a\\(?<=\\>\\)" "a")
(expect-not-match "a\\(?<=\\>\\)b" "ab")
(expect-match "\\(?<=\\_<\\)a" "a")
(expect-not-match "a\\(?<=\\_<\\)b" "ab")
(expect-match "a\\(?<=\\_>\\)" "a")
(expect-not-match "a\\(?<=\\_>\\)b" "ab")
(expect-invalid "\\(?<=\\(.\\)\\1\\)") ; duplicate
(expect-invalid "\\(?<=a*\\)") ; variable width
(expect-invalid "\\(?<=a*?\\)") ; variable width
(expect-invalid "\\(?<=a+\\)") ; variable width
(expect-invalid "\\(?<=a+?\\)") ; variable width
(expect-invalid "\\(?<=a?\\)") ; variable width
(expect-invalid "\\(?<=a??\\)") ; variable width
(expect-invalid "\\(?<=a\\{1,4\\}\\)") ; variable width
(expect-invalid "\\(?<=a\\|bb\\|ccc\\)") ; variable width
(expect-invalid "\\(?<=a\\{4\\}\\)") ; fixed width but not supported yet
(expect-invalid "\\(?<=a\\|\\b\\c\\)") ; fixed width but not supported yet
(expect-not-match "\\(?<!\\)" "")
(expect-match "\\(?<!a\\)" "")
(expect-match "\\(?<!a\\)" "a")
(expect-not-match "\\(?<!a\\)b" "ab")
(expect-match "\\(?<!b\\)" "a")
(expect-not-match "\\(?<!^\\)" "")
(expect-not-match "a\\(?<!^\\)" "")
(expect-not-match "\\(?<!$\\)" "")
(expect-match "\\(?<=a\\)b" "ab")
(expect-match "\\(?<!c\\)b" "ab")
(expect-match "\\(?<!\\(?<!a\\)\\)b" "ab")
(expect-not-match "\\(?<!\\(?<!b\\)\\)b" "ab")
(expect-match "\\(?<!\\(?!a\\).\\)b" "ab")
(expect-match "\\(?<!.\\)a" "aa")
(expect-not-match "\\(?<!.\\)b" "ab")
(expect-not-match "\\(?<!\\(.\\)\\)b" "ab")
(expect-not-match "\\(?<!\\w\\)b" "ab")
(expect-not-match "\\(?<!\\w\\)b" "ab")
(expect-not-match "\\(?<!\\sw\\)b" "ab")
(expect-match "\\(?<!\\sw\\)a" "!a")
(expect-not-match "\\(?<!\\cg\\)a" "λa")
(expect-match "\\(?<!\\Cg\\)a" "λa")
(expect-match "\\(?<![a-z]\\)" "aa")
(expect-match "\\(?<![a-z]\\)a" "1a")
(expect-not-match "\\(?<![^a-z]\\)a" "1a")
(expect-not-match "\\(?<![:ascii:]\\)b" "ab")
(expect-not-match "\\(?<!\\`\\)" "")
(expect-match "a\\(?<!\\`\\)" "a")
(expect-not-match "\\(?<!\\'\\)" "")
(expect-match "\\(?<!\\'\\)a" "a")
(expect-match "\\(?<!\\=\\)" "")
(expect-not-match "\\(?<!\\b\\)a" "a")
(expect-match "a\\(?<!\\b\\)b" "ab")
(expect-not-match "\\(?<!\\B\\)b" "ab")
(expect-match "\\(?<!\\B\\)a" " a")
(expect-not-match "\\(?<!\\<\\)a" "a")
(expect-match "a\\(?<!\\<\\)b" "ab")
(expect-not-match "a\\(?<!\\>\\)" "a")
(expect-match "a\\(?<!\\>\\)b" "ab")
(expect-not-match "\\(?<!\\_<\\)a" "a")
(expect-match "a\\(?<!\\_<\\)b" "ab")
(expect-not-match "a\\(?<!\\_>\\)" "a")
(expect-match "a\\(?<!\\_>\\)b" "ab")
(expect-invalid "\\(?<!\\(.\\)\\1\\)") ; duplicate
(expect-invalid "\\(?<!a*\\)") ; variable width
(expect-invalid "\\(?<!a*?\\)") ; variable width
(expect-invalid "\\(?<!a+\\)") ; variable width
(expect-invalid "\\(?<!a+?\\)") ; variable width
(expect-invalid "\\(?<!a?\\)") ; variable width
(expect-invalid "\\(?<!a??\\)") ; variable width
(expect-invalid "\\(?<!a\\{1,4\\}\\)") ; variable width
(expect-invalid "\\(?<!a\\|bb\\|ccc\\)") ; variable width
(expect-invalid "\\(?<!a\\{4\\}\\)") ; fixed width but not supported yet
(expect-invalid "\\(?<!a\\|\\b\\c\\)") ; fixed width but not supported yet
(expect-match "Hello, \\(?=世界\\)" "Hello, 世界!")
(expect-not-match "Hello, \\(?=せかい\\)" "Hello, 世界!")
(expect-match "Hello, \\(?!せかい\\)" "Hello, 世界!")
(expect-not-match "Hello, \\(?!世界\\)" "Hello, 世界!")
(expect-match "\\(?<=こんにちは\\), World!" "こんにちは, World!")
(expect-not-match "\\(?<=こんにちわ\\), World!" "こんにちは, World!")
(expect-match "\\(?<!こんにちわ\\), World!" "こんにちは, World!")
(expect-not-match "\\(?<!こんにちは\\), World!" "こんにちは, World!")
(require 'cl)
(with-temp-buffer
(insert "abracadabra")
(goto-char (point-min))
(test (equal
(loop while (re-search-forward "a\\(?=b\\)" nil t)
collect (point))
'(2 9))))
(with-temp-buffer
(insert "abracadabra")
(test (equal
(loop while (re-search-backward "a\\(?=b\\)" nil t)
collect (point))
'(8 1))))
(with-temp-buffer
(insert "abracadabra")
(goto-char (point-min))
(test (equal
(loop while (re-search-forward "\\(?<=a\\)b" nil t)
collect (point))
'(3 10))))
(with-temp-buffer
(insert "abracadabra")
(test (equal
(loop while (re-search-backward "\\(?<=a\\)b" nil t)
collect (point))
'(9 2))))
(with-temp-buffer
(insert "abcdebc")
(goto-char 3)
(test (eq (re-search-forward "\\(?<=b\\)c" nil t) 4)))
(with-temp-buffer
(insert "abcdebc")
(goto-char 7)
;; search-backward with lookahead over bound is not supported yet
(test (eq (re-search-backward "b\\(?=c\\)" nil t) 2)))
(when (member "perf" argv)
;; generate big file
(require 'find-func)
(let ((file (concat (or find-function-C-source-directory "~/src/emacs") "/xdisp.c"))
count)
(with-temp-buffer
(insert-file-contents file)
(dolist (pair '((point-min . re-search-forward) (point-max . re-search-backward)))
(dolist (regexp '("unsigned \\(?:char\\|int\\|long\\)" "unsigned \\(?=char\\|int\\|long\\)"
"\\(?:unsigned \\)int" "\\(?<=unsigned \\)int"))
(setq count 0)
(princ-list regexp
": "
(car
(benchmark-run
10
(progn
(goto-char (funcall (car pair)))
(while (funcall (cdr pair) regexp nil t)
(setq count (1+ count))))))
" elapsed (" count " found)"))))))
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#5393: control message for bug 5393
2016-02-29 2:30 ` Lars Ingebrigtsen
@ 2019-06-27 17:41 ` Lars Ingebrigtsen
0 siblings, 0 replies; 5+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-27 17:41 UTC (permalink / raw)
To: Stephen Berman; +Cc: 5393
The original suggestion was:
---
I have attached a patch that enables you to
use lookaround assertion in regexp
with following syntax:
* Positive lookahead assertion
\(?=...\)
* Negative lookahead assertion
\(?!...\)
* Positive lookbehind assertion
\(?<=...\)
* Negative lookbehind assertion
\(?<!...\)
Basically, it works as same as Perl's one.
Spec:
* Any pattern is allowed in lookahead assertion.
* Nested looaround assertion is allowed.
* Capturing is allowed only in positive lookahead/lookbehind assertion.
* Duplication is allowed after such assertion.
* Variable length pattern is NOT yet allowed in lookbehind assertion.
[x] \(?<=[0-9]+\)MB
[o] \(?<=[0-9][0-9][0-9][0-9]\)MB
* Lookahead assertion over start bound is not allowed in re-search-backward.
(re-search-backward "\(?<=a\)b") for buffer "abca_|_b"
will seek to first "ab".
---
That looks pretty cool to me, but there was next to no discussion about
this at the time (2009), which is a bit odd...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-06-27 17:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1RpP6b-00077v-6Q@fencepost.gnu.org>
2016-02-28 6:23 ` bug#5393: control message for bug 5393 Lars Ingebrigtsen
2016-02-28 10:25 ` Stephen Berman
2016-02-29 2:30 ` Lars Ingebrigtsen
2019-06-27 17:41 ` Lars Ingebrigtsen
2016-02-28 18:22 ` Glenn Morris
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.