unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Embedded modifiers in the regex engine
@ 2016-02-25  1:32 Dima Kogan
  2016-02-25  6:11 ` John Wiegley
  2016-02-25 16:15 ` Eli Zaretskii
  0 siblings, 2 replies; 22+ messages in thread
From: Dima Kogan @ 2016-02-25  1:32 UTC (permalink / raw)
  To: emacs-devel

Hi.

I've been thinking of ways to make some fancier aspects of isearch and
hi-lock work better, specifically, the way we handle the different
modes: case-fold, char-fold, lax-whitespace, etc.

The relevant bugs I filed recently:

  http://debbugs.gnu.org/22541
  http://debbugs.gnu.org/22520
  http://debbugs.gnu.org/22479

In short, different parts of emacs (isearch, isearch history, hi-lock,
etc) treat these modes inconsistently, which results in unexpected
behavior.

The best solution I can think of to clean this up is also the most
intrusive: adding support for pcre-style embedded modifiers to
activate/deactivate the modes.

So for instance "\\(?i\\)asdf" would be interpreted as a case-folding
regex regardless of the value of case-fold-search. I think this would be
a great thing to have in general, but for the specific issues in the
bugs above, it'd make things simpler and more correct.

As an example, currently hi-lock generates a complicated-looking regex
to emulate char-folding and case-folding. If we supported the modifiers,
this change would simply be a prepend of "\\(?i\\)" or whatever other
modes we want. This is simple and expected to be bug-free on the hi-lock
level. Bugs such as hi-lock not supporting char-fold and case-fold at
the same time would not happen.

Clearly this is a big change to a core component, so I want to talk
about it first. I looked at our regex implementation, and it looks
possible to add this. But I've seen talk of merging our regex
implementation with the glibc one, so the merge should clearly happen
first.

Also I don't want to touch this without a test suite for our regex
engine. So that would need to happen beforehand as well.

Again, I think this feature would be useful even beyond the context of
these bugs. Thanks for the input.

dima



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-25  1:32 Embedded modifiers in the regex engine Dima Kogan
@ 2016-02-25  6:11 ` John Wiegley
  2016-02-25 21:05   ` Stefan Monnier
  2016-02-25 16:15 ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: John Wiegley @ 2016-02-25  6:11 UTC (permalink / raw)
  To: Dima Kogan; +Cc: emacs-devel

>>>>> Dima Kogan <dima@secretsauce.net> writes:

> So for instance "\\(?i\\)asdf" would be interpreted as a case-folding regex
> regardless of the value of case-fold-search. I think this would be a great
> thing to have in general, but for the specific issues in the bugs above,
> it'd make things simpler and more correct.

This does sound promising.

> Clearly this is a big change to a core component, so I want to talk about it
> first. I looked at our regex implementation, and it looks possible to add
> this. But I've seen talk of merging our regex implementation with the glibc
> one, so the merge should clearly happen first.

There's also been talking of rewriting the regex code, since it has acquired
much baggage over the years.

> Also I don't want to touch this without a test suite for our regex engine.
> So that would need to happen beforehand as well.

Absolutely.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-25  1:32 Embedded modifiers in the regex engine Dima Kogan
  2016-02-25  6:11 ` John Wiegley
@ 2016-02-25 16:15 ` Eli Zaretskii
  2016-02-26  7:19   ` Dima Kogan
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2016-02-25 16:15 UTC (permalink / raw)
  To: Dima Kogan; +Cc: emacs-devel

> From: Dima Kogan <dima@secretsauce.net>
> Date: Wed, 24 Feb 2016 17:32:01 -0800
> 
> I've been thinking of ways to make some fancier aspects of isearch and
> hi-lock work better, specifically, the way we handle the different
> modes: case-fold, char-fold, lax-whitespace, etc.
> 
> The relevant bugs I filed recently:
> 
>   http://debbugs.gnu.org/22541
>   http://debbugs.gnu.org/22520
>   http://debbugs.gnu.org/22479
> 
> In short, different parts of emacs (isearch, isearch history, hi-lock,
> etc) treat these modes inconsistently, which results in unexpected
> behavior.
> 
> The best solution I can think of to clean this up is also the most
> intrusive: adding support for pcre-style embedded modifiers to
> activate/deactivate the modes.
> 
> So for instance "\\(?i\\)asdf" would be interpreted as a case-folding
> regex regardless of the value of case-fold-search. I think this would be
> a great thing to have in general, but for the specific issues in the
> bugs above, it'd make things simpler and more correct.

I hope you are not proposing this as a replacement for the M-s
toggles, because if so, I'm very much opposed.

> As an example, currently hi-lock generates a complicated-looking regex
> to emulate char-folding and case-folding. If we supported the modifiers,
> this change would simply be a prepend of "\\(?i\\)" or whatever other
> modes we want. This is simple and expected to be bug-free on the hi-lock
> level. Bugs such as hi-lock not supporting char-fold and case-fold at
> the same time would not happen.

They will also not happen once character-folding is implemented via
translation tables, instead of regular expressions.  The current
implementation will go away at some point (one hopes).



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-25  6:11 ` John Wiegley
@ 2016-02-25 21:05   ` Stefan Monnier
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Monnier @ 2016-02-25 21:05 UTC (permalink / raw)
  To: emacs-devel

> There's also been talking of rewriting the regex code, since it has acquired
> much baggage over the years.

Actually, I don't think it has much baggage, because it mostly hasn't
been touched over the years (at least the regexp.c code).
The main problem with it is the fact that it's a plain backtracking
algorithm (i.e. exponential worst case), and that it's our own so it
doesn't benefit from improvements added to other regexp engines.


        Stefan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-25 16:15 ` Eli Zaretskii
@ 2016-02-26  7:19   ` Dima Kogan
  2016-02-26  9:08     ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Dima Kogan @ 2016-02-26  7:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On February 25, 2016 8:15:52 AM PST, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Dima Kogan <dima@secretsauce.net>
>> Date: Wed, 24 Feb 2016 17:32:01 -0800
>> 
>> I've been thinking of ways to make some fancier aspects of isearch
>and
>> hi-lock work better, specifically, the way we handle the different
>> modes: case-fold, char-fold, lax-whitespace, etc.
>> <snip>
>> 
>> The best solution I can think of to clean this up is also the most
>> intrusive: adding support for pcre-style embedded modifiers to
>> activate/deactivate the modes.
>> 
>> So for instance "\\(?i\\)asdf" would be interpreted as a case-folding
>> regex regardless of the value of case-fold-search.
>
>I hope you are not proposing this as a replacement for the M-s
>toggles, because if so, I'm very much opposed.

This proposal is concerned with the internals of the regex parser only, so not trying to break M-s toggles.


>> As an example, currently hi-lock generates a complicated-looking
>regex
>> to emulate char-folding and case-folding. If we supported the
>modifiers,
>> this change would simply be a prepend of "\\(?i\\)" or whatever other
>> modes we want. This is simple and expected to be bug-free on the
>hi-lock
>> level. Bugs such as hi-lock not supporting char-fold and case-fold at
>> the same time would not happen.
>
>They will also not happen once character-folding is implemented via
>translation tables, instead of regular expressions.  The current
>implementation will go away at some point (one hopes).

OK, that's good to hear, but char-fold isn't causing all the funkyness here.

I'm looking at importing the regex test suite in glibc to emacs. Would this be possible even if the copyright holders of those tests haven't assigned their work to the fsf? These are tests and not part of emacs on some level, so would that make it ok?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-26  7:19   ` Dima Kogan
@ 2016-02-26  9:08     ` Eli Zaretskii
  2016-02-28  1:50       ` Dima Kogan
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2016-02-26  9:08 UTC (permalink / raw)
  To: Dima Kogan; +Cc: emacs-devel

> From: Dima Kogan <dima@secretsauce.net>
> Date: Thu, 25 Feb 2016 23:19:32 -0800
> CC: emacs-devel@gnu.org
> 
> I'm looking at importing the regex test suite in glibc to emacs. Would this be possible even if the copyright holders of those tests haven't assigned their work to the fsf? These are tests and not part of emacs on some level, so would that make it ok?

If the test suite is GPL v3+, I don't think there should be a
problem.  But IANAL.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-26  9:08     ` Eli Zaretskii
@ 2016-02-28  1:50       ` Dima Kogan
  2016-02-28 16:00         ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Dima Kogan @ 2016-02-28  1:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Dima Kogan <dima@secretsauce.net>
>> Date: Thu, 25 Feb 2016 23:19:32 -0800
>> CC: emacs-devel@gnu.org
>> 
>> I'm looking at importing the regex test suite in glibc to emacs.
>> Would this be possible even if the copyright holders of those tests
>> haven't assigned their work to the fsf? These are tests and not part
>> of emacs on some level, so would that make it ok?
>
> If the test suite is GPL v3+, I don't think there should be a
> problem

If only. Some of the test cases in glibc are their own (LGPL) and some
others came from other projects (boost, BSD, MIT). To be clear, I'm
talking about the test cases themselves, not the code that evaluates the
tests. Does this mean we need to write our own test if we want them?

I did write an evaluator for one of the set of tests. It told me that
the emacs regexen don't support the [.xxx.] and [=xxx=] constructs,
which we already know. It also told me that the glibc regex engine
throws an error when you give it an invalid range, such as [b-a], but
emacs silently matches nothing. I think this is a bug, but probably not
one that's worth fixing on its own.

From the earlier emails it wasn't obvious if there was already a
long-term plan to replace the regex engine. Is there such a plan? What
do we hope to move to?

Thanks!



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-28  1:50       ` Dima Kogan
@ 2016-02-28 16:00         ` Eli Zaretskii
  2016-02-29 13:30           ` Richard Stallman
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2016-02-28 16:00 UTC (permalink / raw)
  To: Dima Kogan; +Cc: emacs-devel

> From: Dima Kogan <lists@dima.secretsauce.net>
> Cc: emacs-devel@gnu.org
> Date: Sat, 27 Feb 2016 17:50:01 -0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Dima Kogan <dima@secretsauce.net>
> >> Date: Thu, 25 Feb 2016 23:19:32 -0800
> >> CC: emacs-devel@gnu.org
> >> 
> >> I'm looking at importing the regex test suite in glibc to emacs.
> >> Would this be possible even if the copyright holders of those tests
> >> haven't assigned their work to the fsf? These are tests and not part
> >> of emacs on some level, so would that make it ok?
> >
> > If the test suite is GPL v3+, I don't think there should be a
> > problem
> 
> If only. Some of the test cases in glibc are their own (LGPL) and some
> others came from other projects (boost, BSD, MIT). To be clear, I'm
> talking about the test cases themselves, not the code that evaluates the
> tests. Does this mean we need to write our own test if we want them?

Not sure.  Probably someone like Richard needs to look closer at the
licenses.

> >From the earlier emails it wasn't obvious if there was already a
> long-term plan to replace the regex engine. Is there such a plan? What
> do we hope to move to?

We have plans, yes, but AFAIK no one is working on that.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-28 16:00         ` Eli Zaretskii
@ 2016-02-29 13:30           ` Richard Stallman
  2016-03-01  0:49             ` Aurélien Aptel
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Stallman @ 2016-02-29 13:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lists, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > If only. Some of the test cases in glibc are their own (LGPL) and some
  > > others came from other projects (boost, BSD, MIT).

I don't know what "boost" means.

Please do not use "BSD" to name a kind of licensing, because there are
two different BSD licenses and they are not equivalent.  One is
compatible with the GPL and one is incompatible with it.  It is
crucial to distinguish these two licenses, every time, without
exception.  See http://gnu.org/licenses/bsd.html.

There are also two different licenses that people sometimes
refer to as "MIT", so it is best not to use that term either.

What I can say in general is that code under any GPL-compatible
license can be linked or merged with GPL-covered code, so a fortiori a
test case under a GPL-compatible license can't be a problem either.


-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-02-29 13:30           ` Richard Stallman
@ 2016-03-01  0:49             ` Aurélien Aptel
  2016-03-01 16:55               ` Richard Stallman
  0 siblings, 1 reply; 22+ messages in thread
From: Aurélien Aptel @ 2016-03-01  0:49 UTC (permalink / raw)
  To: rms; +Cc: lists, Eli Zaretskii, Emacs development discussions

On Mon, Feb 29, 2016 at 2:30 PM, Richard Stallman <rms@gnu.org> wrote:
> I don't know what "boost" means.

boost refers to the Boost Software License. It is listed on the GNU
website as a GPL compatible license:

http://www.gnu.org/licenses/license-list.en.html#boost



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-01  0:49             ` Aurélien Aptel
@ 2016-03-01 16:55               ` Richard Stallman
  2016-03-09  0:34                 ` Dima Kogan
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Stallman @ 2016-03-01 16:55 UTC (permalink / raw)
  To: Aurélien Aptel; +Cc: lists, eliz, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > boost refers to the Boost Software License. It is listed on the GNU
  > website as a GPL compatible license:

That licence is certainly no problem for a test case.

It is even permissible to link Emacs with code under any
GPL-compatible license.  However, it is not clear that test cases need
to have GPL-compatible licenses.  They are not really part of Emacs.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-01 16:55               ` Richard Stallman
@ 2016-03-09  0:34                 ` Dima Kogan
  2016-03-28  0:23                   ` Dima Kogan
  0 siblings, 1 reply; 22+ messages in thread
From: Dima Kogan @ 2016-03-09  0:34 UTC (permalink / raw)
  Cc: emacs-devel

On March 1, 2016 9:55:07 AM MST, Richard Stallman <rms@gnu.org> wrote:
>
>It is even permissible to link Emacs with code under any
>GPL-compatible license.  However, it is not clear that test cases need
>to have GPL-compatible licenses.  They are not really part of Emacs.

OK. Thank you for the clarification. I'll work on getting these into our test suite as the first step in an eventual update to our regex engine.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-09  0:34                 ` Dima Kogan
@ 2016-03-28  0:23                   ` Dima Kogan
  2016-03-28 15:28                     ` Eli Zaretskii
  2016-03-28 16:09                     ` Stefan Monnier
  0 siblings, 2 replies; 22+ messages in thread
From: Dima Kogan @ 2016-03-28  0:23 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 490 bytes --]

Dima Kogan <dima@secretsauce.net> writes:

> OK. Thank you for the clarification. I'll work on getting these into
> our test suite as the first step in an eventual update to our regex
> engine.

Sorry for the delay. Attached are two patches to import most of glibc's
regex tests to emacs. Assuming these are acceptable to be merged into
our tree, what specifically are people thinking in terms of updates to
the regex engine? Is there a particular implementation that we were
considering?


[-- Attachment #2: 0001-New-regex-tests-imported-from-glibc-2.21.patch --]
[-- Type: text/x-diff, Size: 64674 bytes --]

From 6601983d8b58d39479bc13741f2780e749a54505 Mon Sep 17 00:00:00 2001
From: Dima Kogan <dima@secretsauce.net>
Date: Fri, 11 Mar 2016 18:18:14 -0800
Subject: [PATCH 1/2] New regex tests imported from glibc 2.21

* test/src/regex/regex-resources: new tests
---
 test/src/regex/regex-resources/BOOST.tests |  829 ++++++++++
 test/src/regex/regex-resources/PCRE.tests  | 2386 ++++++++++++++++++++++++++++
 test/src/regex/regex-resources/PTESTS      |  341 ++++
 test/src/regex/regex-resources/TESTS       |  167 ++
 4 files changed, 3723 insertions(+)
 create mode 100644 test/src/regex/regex-resources/BOOST.tests
 create mode 100644 test/src/regex/regex-resources/PCRE.tests
 create mode 100644 test/src/regex/regex-resources/PTESTS
 create mode 100644 test/src/regex/regex-resources/TESTS

diff --git a/test/src/regex/regex-resources/BOOST.tests b/test/src/regex/regex-resources/BOOST.tests
new file mode 100644
index 0000000..98fd3b6
--- /dev/null
+++ b/test/src/regex/regex-resources/BOOST.tests
@@ -0,0 +1,829 @@
+; 
+; 
+; this file contains a script of tests to run through regress.exe
+;
+; comments start with a semicolon and proceed to the end of the line
+;
+; changes to regular expression compile flags start with a "-" as the first
+; non-whitespace character and consist of a list of the printable names
+; of the flags, for example "match_default"
+;
+; Other lines contain a test to perform using the current flag status
+; the first token contains the expression to compile, the second the string
+; to match it against. If the second string is "!" then the expression should
+; not compile, that is the first string is an invalid regular expression.
+; This is then followed by a list of integers that specify what should match,
+; each pair represents the starting and ending positions of a subexpression
+; starting with the zeroth subexpression (the whole match).
+; A value of -1 indicates that the subexpression should not take part in the
+; match at all, if the first value is -1 then no part of the expression should
+; match the string.
+;
+; Tests taken from BOOST testsuite and adapted to glibc regex.
+;
+; Boost Software License - Version 1.0 - August 17th, 2003
+;
+; Permission is hereby granted, free of charge, to any person or organization
+; obtaining a copy of the software and accompanying documentation covered by
+; this license (the "Software") to use, reproduce, display, distribute,
+; execute, and transmit the Software, and to prepare derivative works of the
+; Software, and to permit third-parties to whom the Software is furnished to
+; do so, all subject to the following:
+;
+; The copyright notices in the Software and this entire statement, including
+; the above license grant, this restriction and the following disclaimer,
+; must be included in all copies of the Software, in whole or in part, and
+; all derivative works of the Software, unless such copies or derivative
+; works are solely in the form of machine-executable object code generated by
+; a source language processor.
+;
+; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+; IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+; FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+; SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+; FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+; ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+; DEALINGS IN THE SOFTWARE.
+;
+
+- match_default normal REG_EXTENDED
+
+;
+; try some really simple literals:
+a a 0 1
+Z Z 0 1
+Z aaa -1 -1
+Z xxxxZZxxx 4 5
+
+; and some simple brackets:
+(a) zzzaazz 3 4 3 4
+() zzz 0 0 0 0
+() "" 0 0 0 0
+( !
+) ) 0 1
+(aa !
+aa) baa)b 1 4
+a b -1 -1
+\(\) () 0 2
+\(a\) (a) 0 3
+\() () 0 2
+(\) !
+p(a)rameter ABCparameterXYZ 3 12 4 5
+[pq](a)rameter ABCparameterXYZ 3 12 4 5
+
+; now try escaped brackets:
+- match_default bk_parens REG_BASIC
+\(a\) zzzaazz 3 4 3 4
+\(\) zzz 0 0 0 0
+\(\) "" 0 0 0 0
+\( !
+\) !
+\(aa !
+aa\) !
+() () 0 2
+(a) (a) 0 3
+(\) !
+\() !
+
+; now move on to "." wildcards
+- match_default normal REG_EXTENDED REG_STARTEND
+. a 0 1
+. \n 0 1
+. \r 0 1
+. \0 0 1
+
+;
+; now move on to the repetion ops,
+; starting with operator *
+- match_default normal REG_EXTENDED
+a* b 0 0
+ab* a 0 1
+ab* ab 0 2
+ab* sssabbbbbbsss 3 10
+ab*c* a 0 1
+ab*c* abbb 0 4
+ab*c* accc 0 4
+ab*c* abbcc 0 5
+*a !
+\<* !
+\>* !
+\n* \n\n 0 2
+\** ** 0 2
+\* * 0 1
+
+; now try operator +
+ab+ a -1 -1
+ab+ ab 0 2
+ab+ sssabbbbbbsss 3 10
+ab+c+ a -1 -1
+ab+c+ abbb -1 -1
+ab+c+ accc -1 -1
+ab+c+ abbcc 0 5
++a !
+\<+ !
+\>+ !
+\n+ \n\n 0 2
+\+ + 0 1
+\+ ++ 0 1
+\++ ++ 0 2
+
+; now try operator ?
+- match_default normal REG_EXTENDED
+a? b 0 0
+ab? a 0 1
+ab? ab 0 2
+ab? sssabbbbbbsss 3 5
+ab?c? a 0 1
+ab?c? abbb 0 2
+ab?c? accc 0 2
+ab?c? abcc 0 3
+?a !
+\<? !
+\>? !
+\n? \n\n 0 1
+\? ? 0 1
+\? ?? 0 1
+\?? ?? 0 1
+
+; now try operator {}
+- match_default normal REG_EXTENDED
+a{2} a -1 -1
+a{2} aa 0 2
+a{2} aaa 0 2
+a{2,} a -1 -1
+a{2,} aa 0 2
+a{2,} aaaaa 0 5
+a{2,4} a -1 -1
+a{2,4} aa 0 2
+a{2,4} aaa 0 3
+a{2,4} aaaa 0 4
+a{2,4} aaaaa 0 4
+a{} !
+a{2 !
+a} a} 0 2
+\{\} {} 0 2
+
+- match_default normal REG_BASIC
+a\{2\} a -1 -1
+a\{2\} aa 0 2
+a\{2\} aaa 0 2
+a\{2,\} a -1 -1
+a\{2,\} aa 0 2
+a\{2,\} aaaaa 0 5
+a\{2,4\} a -1 -1
+a\{2,4\} aa 0 2
+a\{2,4\} aaa 0 3
+a\{2,4\} aaaa 0 4
+a\{2,4\} aaaaa 0 4
+{} {} 0 2
+
+; now test the alternation operator |
+- match_default normal REG_EXTENDED
+a|b a 0 1
+a|b b 0 1
+a(b|c) ab 0 2 1 2
+a(b|c) ac 0 2 1 2
+a(b|c) ad -1 -1 -1 -1
+a\| a| 0 2
+
+; now test the set operator []
+- match_default normal REG_EXTENDED
+; try some literals first
+[abc] a 0 1
+[abc] b 0 1
+[abc] c 0 1
+[abc] d -1 -1
+[^bcd] a 0 1
+[^bcd] b -1 -1
+[^bcd] d -1 -1
+[^bcd] e 0 1
+a[b]c abc 0 3
+a[ab]c abc 0 3
+a[^ab]c adc 0 3
+a[]b]c a]c 0 3
+a[[b]c a[c 0 3
+a[-b]c a-c 0 3
+a[^]b]c adc 0 3
+a[^-b]c adc 0 3
+a[b-]c a-c 0 3
+a[b !
+a[] !
+
+; then some ranges
+[b-e] a -1 -1
+[b-e] b 0 1
+[b-e] e 0 1
+[b-e] f -1 -1
+[^b-e] a 0 1
+[^b-e] b -1 -1
+[^b-e] e -1 -1
+[^b-e] f 0 1
+a[1-3]c a2c 0 3
+a[3-1]c !
+a[1-3-5]c !
+a[1- !
+
+; and some classes
+a[[:alpha:]]c abc 0 3
+a[[:unknown:]]c !
+a[[: !
+a[[:alpha !
+a[[:alpha:] !
+a[[:alpha,:] !
+a[[:]:]]b !
+a[[:-:]]b !
+a[[:alph:]] !
+a[[:alphabet:]] !
+[[:alnum:]]+ -%@a0X_- 3 6
+[[:alpha:]]+ -%@aX_0- 3 5
+[[:blank:]]+ "a  \tb" 1 4
+[[:cntrl:]]+ a\n\tb 1 3
+[[:digit:]]+ a019b 1 4
+[[:graph:]]+ " a%b " 1 4
+[[:lower:]]+ AabC 1 3
+; This test fails with STLPort, disable for now as this is a corner case anyway...
+;[[:print:]]+ "\na b\n" 1 4
+[[:punct:]]+ " %-&\t" 1 4
+[[:space:]]+ "a \n\t\rb" 1 5
+[[:upper:]]+ aBCd 1 3
+[[:xdigit:]]+ p0f3Cx 1 5
+
+; now test flag settings:
+- escape_in_lists REG_NO_POSIX_TEST
+[\n] \n 0 1
+- REG_NO_POSIX_TEST
+
+; line anchors
+- match_default normal REG_EXTENDED
+^ab ab 0 2
+^ab xxabxx -1 -1
+ab$ ab 0 2
+ab$ abxx -1 -1
+- match_default match_not_bol match_not_eol normal REG_EXTENDED REG_NOTBOL REG_NOTEOL
+^ab ab -1 -1
+^ab xxabxx -1 -1
+ab$ ab -1 -1
+ab$ abxx -1 -1
+
+; back references
+- match_default normal REG_PERL
+a(b)\2c	!
+a(b\1)c	!
+a(b*)c\1d abbcbbd 0 7 1 3
+a(b*)c\1d abbcbd -1 -1
+a(b*)c\1d abbcbbbd -1 -1
+^(.)\1 abc -1 -1
+a([bc])\1d abcdabbd	4 8 5 6
+; strictly speaking this is at best ambiguous, at worst wrong, this is what most
+; re implimentations will match though.
+a(([bc])\2)*d abbccd 0 6 3 5 3 4
+
+a(([bc])\2)*d abbcbd -1 -1
+a((b)*\2)*d abbbd 0 5 1 4 2 3
+; perl only:
+(ab*)[ab]*\1 ababaaa 0 7 0 1
+(a)\1bcd aabcd 0 5 0 1
+(a)\1bc*d aabcd 0 5 0 1
+(a)\1bc*d aabd 0 4 0 1
+(a)\1bc*d aabcccd 0 7 0 1
+(a)\1bc*[ce]d aabcccd 0 7 0 1
+^(a)\1b(c)*cd$ aabcccd 0 7 0 1 4 5
+
+; posix only: 
+- match_default extended REG_EXTENDED
+(ab*)[ab]*\1 ababaaa 0 7 0 1
+
+;
+; word operators:
+\w a 0 1
+\w z 0 1
+\w A 0 1
+\w Z 0 1
+\w _ 0 1
+\w } -1 -1
+\w ` -1 -1
+\w [ -1 -1
+\w @ -1 -1
+; non-word:
+\W a -1 -1
+\W z -1 -1
+\W A -1 -1
+\W Z -1 -1
+\W _ -1 -1
+\W } 0 1
+\W ` 0 1
+\W [ 0 1
+\W @ 0 1
+; word start:
+\<abcd "  abcd" 2 6
+\<ab cab -1 -1
+\<ab "\nab" 1 3
+\<tag ::tag 2 5
+;word end:
+abc\> abc 0 3
+abc\> abcd -1 -1
+abc\> abc\n 0 3
+abc\> abc:: 0 3
+; word boundary:
+\babcd "  abcd" 2 6
+\bab cab -1 -1
+\bab "\nab" 1 3
+\btag ::tag 2 5
+abc\b abc 0 3
+abc\b abcd -1 -1
+abc\b abc\n 0 3
+abc\b abc:: 0 3
+; within word:
+\B ab 1 1
+a\Bb ab 0 2
+a\B ab 0 1
+a\B a -1 -1
+a\B "a " -1 -1
+
+;
+; buffer operators:
+\`abc abc 0 3
+\`abc \nabc -1 -1
+\`abc " abc" -1 -1
+abc\' abc 0 3
+abc\' abc\n -1 -1
+abc\' "abc " -1 -1
+
+;
+; now follows various complex expressions designed to try and bust the matcher:
+a(((b)))c abc 0 3 1 2 1 2 1 2
+a(b|(c))d abd 0 3 1 2 -1 -1
+a(b|(c))d acd 0 3 1 2 1 2
+a(b*|c)d abbd 0 4 1 3
+; just gotta have one DFA-buster, of course
+a[ab]{20} aaaaabaaaabaaaabaaaab 0 21
+; and an inline expansion in case somebody gets tricky
+a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab] aaaaabaaaabaaaabaaaab 0 21
+; and in case somebody just slips in an NFA...
+a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab](wee|week)(knights|night) aaaaabaaaabaaaabaaaabweeknights 0 31 21 24 24 31
+; one really big one
+1234567890123456789012345678901234567890123456789012345678901234567890 a1234567890123456789012345678901234567890123456789012345678901234567890b 1 71
+; fish for problems as brackets go past 8
+[ab][cd][ef][gh][ij][kl][mn] xacegikmoq 1 8
+[ab][cd][ef][gh][ij][kl][mn][op] xacegikmoq 1 9
+[ab][cd][ef][gh][ij][kl][mn][op][qr] xacegikmoqy 1 10
+[ab][cd][ef][gh][ij][kl][mn][op][q] xacegikmoqy 1 10
+; and as parenthesis go past 9:
+(a)(b)(c)(d)(e)(f)(g)(h) zabcdefghi 1 9 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9
+(a)(b)(c)(d)(e)(f)(g)(h)(i) zabcdefghij 1 10 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
+(a)(b)(c)(d)(e)(f)(g)(h)(i)(j) zabcdefghijk 1 11 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11
+(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k) zabcdefghijkl 1 12 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12
+(a)d|(b)c abc 1 3 -1 -1 1 2
+_+((www)|(ftp)|(mailto)):_* "_wwwnocolon _mailto:" 12 20 13 19 -1 -1 -1 -1 13 19
+
+; subtleties of matching
+;a(b)?c\1d acd 0 3 -1 -1
+; POSIX is about the following test:
+a(b)?c\1d acd -1 -1 -1 -1
+a(b?c)+d accd 0 4 2 3
+(wee|week)(knights|night) weeknights 0 10 0 3 3 10
+.* abc 0 3
+a(b|(c))d abd 0 3 1 2 -1 -1
+a(b|(c))d acd 0 3 1 2 1 2
+a(b*|c|e)d abbd 0 4 1 3
+a(b*|c|e)d acd 0 3 1 2
+a(b*|c|e)d ad 0 2 1 1
+a(b?)c abc 0 3 1 2
+a(b?)c ac 0 2 1 1
+a(b+)c abc 0 3 1 2
+a(b+)c abbbc 0 5 1 4 
+a(b*)c ac 0 2 1 1 
+(a|ab)(bc([de]+)f|cde) abcdef 0 6 0 1 1 6 3 5
+a([bc]?)c abc 0 3 1 2
+a([bc]?)c ac 0 2 1 1 
+a([bc]+)c abc 0 3 1 2
+a([bc]+)c abcc 0 4 1 3
+a([bc]+)bc abcbc 0 5 1 3
+a(bb+|b)b abb 0 3 1 2
+a(bbb+|bb+|b)b abb 0 3 1 2
+a(bbb+|bb+|b)b abbb 0 4 1 3
+a(bbb+|bb+|b)bb abbb 0 4 1 2
+(.*).* abcdef 0 6 0 6
+(a*)* bc 0 0 0 0
+xyx*xz xyxxxxyxxxz 5 11
+
+; do we get the right subexpression when it is used more than once?
+a(b|c)*d ad 0 2 -1 -1
+a(b|c)*d abcd 0 4 2 3
+a(b|c)+d abd 0 3 1 2
+a(b|c)+d abcd 0 4 2 3
+a(b|c?)+d ad 0 2 1 1
+a(b|c){0,0}d ad 0 2 -1 -1
+a(b|c){0,1}d ad 0 2 -1 -1
+a(b|c){0,1}d abd 0 3 1 2
+a(b|c){0,2}d ad 0 2 -1 -1
+a(b|c){0,2}d abcd 0 4 2 3
+a(b|c){0,}d ad 0 2 -1 -1
+a(b|c){0,}d abcd 0 4 2 3
+a(b|c){1,1}d abd 0 3 1 2
+a(b|c){1,2}d abd 0 3 1 2
+a(b|c){1,2}d abcd 0 4 2 3
+a(b|c){1,}d abd 0 3 1 2
+a(b|c){1,}d abcd 0 4 2 3
+a(b|c){2,2}d acbd 0 4 2 3
+a(b|c){2,2}d abcd 0 4 2 3
+a(b|c){2,4}d abcd 0 4 2 3
+a(b|c){2,4}d abcbd 0 5 3 4
+a(b|c){2,4}d abcbcd 0 6 4 5
+a(b|c){2,}d abcd 0 4 2 3
+a(b|c){2,}d abcbd 0 5 3 4
+; perl only: these conflict with the POSIX test below
+;a(b|c?)+d abcd 0 4 3 3
+;a(b+|((c)*))+d abd 0 3 2 2 2 2 -1 -1
+;a(b+|((c)*))+d abcd 0 4 3 3 3 3 2 3
+
+; posix only:
+- match_default extended REG_EXTENDED REG_STARTEND
+
+a(b|c?)+d abcd 0 4 2 3
+a(b|((c)*))+d abcd 0 4 2 3 2 3 2 3
+a(b+|((c)*))+d abd 0 3 1 2 -1 -1 -1 -1
+a(b+|((c)*))+d abcd 0 4 2 3 2 3 2 3
+a(b|((c)*))+d ad 0 2 1 1 1 1 -1 -1
+a(b|((c)*))*d abcd 0 4 2 3 2 3 2 3
+a(b+|((c)*))*d abd 0 3 1 2 -1 -1 -1 -1
+a(b+|((c)*))*d abcd 0 4 2 3 2 3 2 3
+a(b|((c)*))*d ad 0 2 1 1 1 1 -1 -1
+
+- match_default normal REG_PERL
+; try to match C++ syntax elements:
+; line comment:
+//[^\n]* "++i //here is a line comment\n" 4 28
+; block comment:
+/\*([^*]|\*+[^*/])*\*+/ "/* here is a block comment */" 0 29 26 27
+/\*([^*]|\*+[^*/])*\*+/ "/**/" 0 4 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/***/" 0 5 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/****/" 0 6 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/*****/" 0 7 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/*****/*/" 0 7 -1 -1
+; preprossor directives:
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol" 0 19 -1 -1
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) #x" 0 25 -1 -1
+; perl only:
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) \\  \r\n  foo();\\\r\n   printf(#x);" 0 53 30 42
+; literals:
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFF         						0 4		0 4		0 4 	-1 -1 	-1 -1 	-1 -1 	-1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 35 									0 2 	0 2		-1 -1 	0 2 	-1 -1 	-1 -1 	-1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFu 								0 5		0 4		0 4 	-1 -1 	-1 -1 	-1 -1 	-1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFL 								0 5		0 4		0 4 	-1 -1 	4 5 	-1 -1 	-1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFFFFFFFFFFFFFFFuint64 			0 24	0 18	0 18 	-1 -1 	19 24 	19 24 	22 24
+; strings:
+'([^\\']|\\.)*' '\\x3A' 0 6 4 5
+'([^\\']|\\.)*' '\\'' 0 4 1 3
+'([^\\']|\\.)*' '\\n' 0 4 1 3
+
+; finally try some case insensitive matches:
+- match_default normal REG_EXTENDED REG_ICASE
+; upper and lower have no meaning here so they fail, however these
+; may compile with other libraries...
+;[[:lower:]] !
+;[[:upper:]] !
+0123456789@abcdefghijklmnopqrstuvwxyz\[\\\]\^_`ABCDEFGHIJKLMNOPQRSTUVWXYZ\{\|\} 0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]\^_`abcdefghijklmnopqrstuvwxyz\{\|\} 0 72
+
+; known and suspected bugs:
+- match_default normal REG_EXTENDED
+\( ( 0 1
+\) ) 0 1
+\$ $ 0 1
+\^ ^ 0 1
+\. . 0 1
+\* * 0 1
+\+ + 0 1
+\? ? 0 1
+\[ [ 0 1
+\] ] 0 1
+\| | 0 1
+\\ \\ 0 1
+# # 0 1
+\# # 0 1
+a- a- 0 2
+\- - 0 1
+\{ { 0 1
+\} } 0 1
+0 0 0 1
+1 1 0 1
+9 9 0 1
+b b 0 1
+B B 0 1
+< < 0 1
+> > 0 1
+w w 0 1
+W W 0 1
+` ` 0 1
+' ' 0 1
+\n \n 0 1
+, , 0 1
+a a 0 1
+f f 0 1
+n n 0 1
+r r 0 1
+t t 0 1
+v v 0 1
+c c 0 1
+x x 0 1
+: : 0 1
+(\.[[:alnum:]]+){2} "w.a.b " 1 5 3 5
+
+- match_default normal REG_EXTENDED REG_ICASE
+a A 0 1
+A a 0 1
+[abc]+ abcABC 0 6
+[ABC]+ abcABC 0 6
+[a-z]+ abcABC 0 6
+[A-Z]+ abzANZ 0 6
+[a-Z]+ abzABZ 0 6
+[A-z]+ abzABZ 0 6
+[[:lower:]]+ abyzABYZ 0 8
+[[:upper:]]+ abzABZ 0 6
+[[:alpha:]]+ abyzABYZ 0 8
+[[:alnum:]]+ 09abyzABYZ 0 10
+
+; word start:
+\<abcd "  abcd" 2 6
+\<ab cab -1 -1
+\<ab "\nab" 1 3
+\<tag ::tag 2 5
+;word end:
+abc\> abc 0 3
+abc\> abcd -1 -1
+abc\> abc\n 0 3
+abc\> abc:: 0 3
+
+; collating elements and rewritten set code:
+- match_default normal REG_EXTENDED REG_STARTEND
+;[[.zero.]] 0 0 1
+;[[.one.]] 1 0 1
+;[[.two.]] 2 0 1
+;[[.three.]] 3 0 1
+[[.a.]] baa 1 2
+;[[.right-curly-bracket.]] } 0 1
+;[[.NUL.]] \0 0 1
+[[:<:]z] !
+[a[:>:]] !
+[[=a=]] a 0 1
+;[[=right-curly-bracket=]] } 0 1
+- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
+[[.A.]] A 0 1
+[[.A.]] a 0 1
+[[.A.]-b]+ AaBb 0 4
+[A-[.b.]]+ AaBb 0 4
+[[.a.]-B]+ AaBb 0 4
+[a-[.B.]]+ AaBb 0 4
+- match_default normal REG_EXTENDED REG_STARTEND
+[[.a.]-c]+ abcd 0 3
+[a-[.c.]]+ abcd 0 3
+[[:alpha:]-a] !
+[a-[:alpha:]] !
+
+; try mutli-character ligatures:
+;[[.ae.]] ae 0 2
+;[[.ae.]] aE -1 -1
+;[[.AE.]] AE 0 2
+;[[.Ae.]] Ae 0 2
+;[[.ae.]-b] a -1 -1
+;[[.ae.]-b] b 0 1
+;[[.ae.]-b] ae 0 2
+;[a-[.ae.]] a 0 1
+;[a-[.ae.]] b -1 -1
+;[a-[.ae.]] ae 0 2
+- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
+;[[.ae.]] AE 0 2
+;[[.ae.]] Ae 0 2
+;[[.AE.]] Ae 0 2
+;[[.Ae.]] aE 0 2
+;[[.AE.]-B] a -1 -1
+;[[.Ae.]-b] b 0 1
+;[[.Ae.]-b] B 0 1
+;[[.ae.]-b] AE 0 2
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_NO_POSIX_TEST
+\s+ "ab   ab" 2 5
+\S+ "  abc  " 2 5
+
+- match_default normal REG_EXTENDED REG_STARTEND
+\`abc abc 0 3
+\`abc aabc -1 -1
+abc\' abc 0 3
+abc\' abcd -1 -1
+abc\' abc\n\n -1 -1
+abc\' abc 0 3
+
+; extended repeat checking to exercise new algorithms:
+ab.*xy abxy_ 0 4
+ab.*xy ab_xy_ 0 5
+ab.*xy abxy 0 4
+ab.*xy ab_xy 0 5
+ab.* ab 0 2
+ab.* ab__ 0 4
+
+ab.{2,5}xy ab__xy_ 0 6
+ab.{2,5}xy ab____xy_ 0 8
+ab.{2,5}xy ab_____xy_ 0 9
+ab.{2,5}xy ab__xy 0 6
+ab.{2,5}xy ab_____xy 0 9
+ab.{2,5} ab__ 0 4
+ab.{2,5} ab_______ 0 7
+ab.{2,5}xy ab______xy -1 -1
+ab.{2,5}xy ab_xy -1 -1
+
+ab.*?xy abxy_ 0 4
+ab.*?xy ab_xy_ 0 5
+ab.*?xy abxy 0 4
+ab.*?xy ab_xy 0 5
+ab.*? ab 0 2
+ab.*? ab__ 0 4
+
+ab.{2,5}?xy ab__xy_ 0 6
+ab.{2,5}?xy ab____xy_ 0 8
+ab.{2,5}?xy ab_____xy_ 0 9
+ab.{2,5}?xy ab__xy 0 6
+ab.{2,5}?xy ab_____xy 0 9
+ab.{2,5}? ab__ 0 4
+ab.{2,5}? ab_______ 0 7
+ab.{2,5}?xy ab______xy -1 -1
+ab.{2,5}xy ab_xy -1 -1
+
+; again but with slower algorithm variant:
+- match_default REG_EXTENDED
+; now again for single character repeats:
+
+ab_*xy abxy_ 0 4
+ab_*xy ab_xy_ 0 5
+ab_*xy abxy 0 4
+ab_*xy ab_xy 0 5
+ab_* ab 0 2
+ab_* ab__ 0 4
+
+ab_{2,5}xy ab__xy_ 0 6
+ab_{2,5}xy ab____xy_ 0 8
+ab_{2,5}xy ab_____xy_ 0 9
+ab_{2,5}xy ab__xy 0 6
+ab_{2,5}xy ab_____xy 0 9
+ab_{2,5} ab__ 0 4
+ab_{2,5} ab_______ 0 7
+ab_{2,5}xy ab______xy -1 -1
+ab_{2,5}xy ab_xy -1 -1
+
+ab_*?xy abxy_ 0 4
+ab_*?xy ab_xy_ 0 5
+ab_*?xy abxy 0 4
+ab_*?xy ab_xy 0 5
+ab_*? ab 0 2
+ab_*? ab__ 0 4
+
+ab_{2,5}?xy ab__xy_ 0 6
+ab_{2,5}?xy ab____xy_ 0 8
+ab_{2,5}?xy ab_____xy_ 0 9
+ab_{2,5}?xy ab__xy 0 6
+ab_{2,5}?xy ab_____xy 0 9
+ab_{2,5}? ab__ 0 4
+ab_{2,5}? ab_______ 0 7
+ab_{2,5}?xy ab______xy -1 -1
+ab_{2,5}xy ab_xy -1 -1
+
+; and again for sets:
+ab[_,;]*xy abxy_ 0 4
+ab[_,;]*xy ab_xy_ 0 5
+ab[_,;]*xy abxy 0 4
+ab[_,;]*xy ab_xy 0 5
+ab[_,;]* ab 0 2
+ab[_,;]* ab__ 0 4
+
+ab[_,;]{2,5}xy ab__xy_ 0 6
+ab[_,;]{2,5}xy ab____xy_ 0 8
+ab[_,;]{2,5}xy ab_____xy_ 0 9
+ab[_,;]{2,5}xy ab__xy 0 6
+ab[_,;]{2,5}xy ab_____xy 0 9
+ab[_,;]{2,5} ab__ 0 4
+ab[_,;]{2,5} ab_______ 0 7
+ab[_,;]{2,5}xy ab______xy -1 -1
+ab[_,;]{2,5}xy ab_xy -1 -1
+
+ab[_,;]*?xy abxy_ 0 4
+ab[_,;]*?xy ab_xy_ 0 5
+ab[_,;]*?xy abxy 0 4
+ab[_,;]*?xy ab_xy 0 5
+ab[_,;]*? ab 0 2
+ab[_,;]*? ab__ 0 4
+
+ab[_,;]{2,5}?xy ab__xy_ 0 6
+ab[_,;]{2,5}?xy ab____xy_ 0 8
+ab[_,;]{2,5}?xy ab_____xy_ 0 9
+ab[_,;]{2,5}?xy ab__xy 0 6
+ab[_,;]{2,5}?xy ab_____xy 0 9
+ab[_,;]{2,5}? ab__ 0 4
+ab[_,;]{2,5}? ab_______ 0 7
+ab[_,;]{2,5}?xy ab______xy -1 -1
+ab[_,;]{2,5}xy ab_xy -1 -1
+
+; and again for tricky sets with digraphs:
+;ab[_[.ae.]]*xy abxy_ 0 4
+;ab[_[.ae.]]*xy ab_xy_ 0 5
+;ab[_[.ae.]]*xy abxy 0 4
+;ab[_[.ae.]]*xy ab_xy 0 5
+;ab[_[.ae.]]* ab 0 2
+;ab[_[.ae.]]* ab__ 0 4
+
+;ab[_[.ae.]]{2,5}xy ab__xy_ 0 6
+;ab[_[.ae.]]{2,5}xy ab____xy_ 0 8
+;ab[_[.ae.]]{2,5}xy ab_____xy_ 0 9
+;ab[_[.ae.]]{2,5}xy ab__xy 0 6
+;ab[_[.ae.]]{2,5}xy ab_____xy 0 9
+;ab[_[.ae.]]{2,5} ab__ 0 4
+;ab[_[.ae.]]{2,5} ab_______ 0 7
+;ab[_[.ae.]]{2,5}xy ab______xy -1 -1
+;ab[_[.ae.]]{2,5}xy ab_xy -1 -1
+
+;ab[_[.ae.]]*?xy abxy_ 0 4
+;ab[_[.ae.]]*?xy ab_xy_ 0 5
+;ab[_[.ae.]]*?xy abxy 0 4
+;ab[_[.ae.]]*?xy ab_xy 0 5
+;ab[_[.ae.]]*? ab 0 2
+;ab[_[.ae.]]*? ab__ 0 2
+
+;ab[_[.ae.]]{2,5}?xy ab__xy_ 0 6
+;ab[_[.ae.]]{2,5}?xy ab____xy_ 0 8
+;ab[_[.ae.]]{2,5}?xy ab_____xy_ 0 9
+;ab[_[.ae.]]{2,5}?xy ab__xy 0 6
+;ab[_[.ae.]]{2,5}?xy ab_____xy 0 9
+;ab[_[.ae.]]{2,5}? ab__ 0 4
+;ab[_[.ae.]]{2,5}? ab_______ 0 4
+;ab[_[.ae.]]{2,5}?xy ab______xy -1 -1
+;ab[_[.ae.]]{2,5}xy ab_xy -1 -1
+
+; new bugs detected in spring 2003:
+- normal match_continuous REG_NO_POSIX_TEST
+b abc 1 2
+
+() abc 0 0 0 0
+^() abc 0 0 0 0
+^()+ abc 0 0 0 0
+^(){1} abc 0 0 0 0
+^(){2} abc 0 0 0 0
+^((){2}) abc 0 0 0 0 0 0
+() "" 0 0 0 0
+()\1 "" 0 0 0 0
+()\1 a 0 0 0 0
+a()\1b ab 0 2 1 1
+a()b\1 ab 0 2 1 1
+
+; subtleties of matching with no sub-expressions marked
+- normal match_nosubs REG_NO_POSIX_TEST
+a(b?c)+d accd 0 4 
+(wee|week)(knights|night) weeknights 0 10 
+.* abc 0 3
+a(b|(c))d abd 0 3 
+a(b|(c))d acd 0 3
+a(b*|c|e)d abbd 0 4
+a(b*|c|e)d acd 0 3 
+a(b*|c|e)d ad 0 2
+a(b?)c abc 0 3
+a(b?)c ac 0 2
+a(b+)c abc 0 3
+a(b+)c abbbc 0 5
+a(b*)c ac 0 2
+(a|ab)(bc([de]+)f|cde) abcdef 0 6
+a([bc]?)c abc 0 3
+a([bc]?)c ac 0 2
+a([bc]+)c abc 0 3
+a([bc]+)c abcc 0 4
+a([bc]+)bc abcbc 0 5
+a(bb+|b)b abb 0 3
+a(bbb+|bb+|b)b abb 0 3
+a(bbb+|bb+|b)b abbb 0 4
+a(bbb+|bb+|b)bb abbb 0 4
+(.*).* abcdef 0 6
+(a*)* bc 0 0
+
+- normal nosubs REG_NO_POSIX_TEST
+a(b?c)+d accd 0 4 
+(wee|week)(knights|night) weeknights 0 10 
+.* abc 0 3
+a(b|(c))d abd 0 3 
+a(b|(c))d acd 0 3
+a(b*|c|e)d abbd 0 4
+a(b*|c|e)d acd 0 3 
+a(b*|c|e)d ad 0 2
+a(b?)c abc 0 3
+a(b?)c ac 0 2
+a(b+)c abc 0 3
+a(b+)c abbbc 0 5
+a(b*)c ac 0 2
+(a|ab)(bc([de]+)f|cde) abcdef 0 6
+a([bc]?)c abc 0 3
+a([bc]?)c ac 0 2
+a([bc]+)c abc 0 3
+a([bc]+)c abcc 0 4
+a([bc]+)bc abcbc 0 5
+a(bb+|b)b abb 0 3
+a(bbb+|bb+|b)b abb 0 3
+a(bbb+|bb+|b)b abbb 0 4
+a(bbb+|bb+|b)bb abbb 0 4
+(.*).* abcdef 0 6
+(a*)* bc 0 0
+
diff --git a/test/src/regex/regex-resources/PCRE.tests b/test/src/regex/regex-resources/PCRE.tests
new file mode 100644
index 0000000..0fb9cad
--- /dev/null
+++ b/test/src/regex/regex-resources/PCRE.tests
@@ -0,0 +1,2386 @@
+# PCRE version 4.4 21-August-2003
+
+# Tests taken from PCRE and modified to suit glibc regex.
+#
+# PCRE LICENCE
+# ------------
+#
+# PCRE is a library of functions to support regular expressions whose syntax
+# and semantics are as close as possible to those of the Perl 5 language.
+#
+# Written by: Philip Hazel <ph10@cam.ac.uk>
+#
+# University of Cambridge Computing Service,
+# Cambridge, England. Phone: +44 1223 334714.
+#
+# Copyright (c) 1997-2003 University of Cambridge
+#
+# Permission is granted to anyone to use this software for any purpose on any
+# computer system, and to redistribute it freely, subject to the following
+# restrictions:
+#
+# 1. This software is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# 2. The origin of this software must not be misrepresented, either by
+#    explicit claim or by omission. In practice, this means that if you use
+#    PCRE in software that you distribute to others, commercially or
+#    otherwise, you must put a sentence like this
+#
+#      Regular expression support is provided by the PCRE library package,
+#      which is open source software, written by Philip Hazel, and copyright
+#      by the University of Cambridge, England.
+#
+#    somewhere reasonably visible in your documentation and in any relevant
+#    files or online help data or similar. A reference to the ftp site for
+#    the source, that is, to
+#
+#      ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/
+#
+#    should also be given in the documentation. However, this condition is not
+#    intended to apply to whole chains of software. If package A includes PCRE,
+#    it must acknowledge it, but if package B is software that includes package
+#    A, the condition is not imposed on package B (unless it uses PCRE
+#    independently).
+#
+# 3. Altered versions must be plainly marked as such, and must not be
+#    misrepresented as being the original software.
+#
+# 4. If PCRE is embedded in any software that is released under the GNU
+#   General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL),
+#   then the terms of that licence shall supersede any condition above with
+#   which it is incompatible.
+#
+# The documentation for PCRE, supplied in the "doc" directory, is distributed
+# under the same terms as the software itself.
+#
+# End
+#
+
+/the quick brown fox/
+    the quick brown fox
+ 0: the quick brown fox
+    The quick brown FOX
+No match
+    What do you know about the quick brown fox?
+ 0: the quick brown fox
+    What do you know about THE QUICK BROWN FOX?
+No match
+
+/The quick brown fox/i
+    the quick brown fox
+ 0: the quick brown fox
+    The quick brown FOX
+ 0: The quick brown FOX
+    What do you know about the quick brown fox?
+ 0: the quick brown fox
+    What do you know about THE QUICK BROWN FOX?
+ 0: THE QUICK BROWN FOX
+
+/a*abc?xyz+pqr{3}ab{2,}xy{4,5}pq{0,6}AB{0,}zz/
+    abxyzpqrrrabbxyyyypqAzz
+ 0: abxyzpqrrrabbxyyyypqAzz
+    abxyzpqrrrabbxyyyypqAzz
+ 0: abxyzpqrrrabbxyyyypqAzz
+    aabxyzpqrrrabbxyyyypqAzz
+ 0: aabxyzpqrrrabbxyyyypqAzz
+    aaabxyzpqrrrabbxyyyypqAzz
+ 0: aaabxyzpqrrrabbxyyyypqAzz
+    aaaabxyzpqrrrabbxyyyypqAzz
+ 0: aaaabxyzpqrrrabbxyyyypqAzz
+    abcxyzpqrrrabbxyyyypqAzz
+ 0: abcxyzpqrrrabbxyyyypqAzz
+    aabcxyzpqrrrabbxyyyypqAzz
+ 0: aabcxyzpqrrrabbxyyyypqAzz
+    aaabcxyzpqrrrabbxyyyypAzz
+ 0: aaabcxyzpqrrrabbxyyyypAzz
+    aaabcxyzpqrrrabbxyyyypqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqAzz
+    aaabcxyzpqrrrabbxyyyypqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqAzz
+    aaabcxyzpqrrrabbxyyyypqqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqqAzz
+    aaabcxyzpqrrrabbxyyyypqqqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqqqAzz
+    aaabcxyzpqrrrabbxyyyypqqqqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqqqqAzz
+    aaabcxyzpqrrrabbxyyyypqqqqqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqqqqqAzz
+    aaaabcxyzpqrrrabbxyyyypqAzz
+ 0: aaaabcxyzpqrrrabbxyyyypqAzz
+    abxyzzpqrrrabbxyyyypqAzz
+ 0: abxyzzpqrrrabbxyyyypqAzz
+    aabxyzzzpqrrrabbxyyyypqAzz
+ 0: aabxyzzzpqrrrabbxyyyypqAzz
+    aaabxyzzzzpqrrrabbxyyyypqAzz
+ 0: aaabxyzzzzpqrrrabbxyyyypqAzz
+    aaaabxyzzzzpqrrrabbxyyyypqAzz
+ 0: aaaabxyzzzzpqrrrabbxyyyypqAzz
+    abcxyzzpqrrrabbxyyyypqAzz
+ 0: abcxyzzpqrrrabbxyyyypqAzz
+    aabcxyzzzpqrrrabbxyyyypqAzz
+ 0: aabcxyzzzpqrrrabbxyyyypqAzz
+    aaabcxyzzzzpqrrrabbxyyyypqAzz
+ 0: aaabcxyzzzzpqrrrabbxyyyypqAzz
+    aaaabcxyzzzzpqrrrabbxyyyypqAzz
+ 0: aaaabcxyzzzzpqrrrabbxyyyypqAzz
+    aaaabcxyzzzzpqrrrabbbxyyyypqAzz
+ 0: aaaabcxyzzzzpqrrrabbbxyyyypqAzz
+    aaaabcxyzzzzpqrrrabbbxyyyyypqAzz
+ 0: aaaabcxyzzzzpqrrrabbbxyyyyypqAzz
+    aaabcxyzpqrrrabbxyyyypABzz
+ 0: aaabcxyzpqrrrabbxyyyypABzz
+    aaabcxyzpqrrrabbxyyyypABBzz
+ 0: aaabcxyzpqrrrabbxyyyypABBzz
+    >>>aaabxyzpqrrrabbxyyyypqAzz
+ 0: aaabxyzpqrrrabbxyyyypqAzz
+    >aaaabxyzpqrrrabbxyyyypqAzz
+ 0: aaaabxyzpqrrrabbxyyyypqAzz
+    >>>>abcxyzpqrrrabbxyyyypqAzz
+ 0: abcxyzpqrrrabbxyyyypqAzz
+    *** Failers
+No match
+    abxyzpqrrabbxyyyypqAzz
+No match
+    abxyzpqrrrrabbxyyyypqAzz
+No match
+    abxyzpqrrrabxyyyypqAzz
+No match
+    aaaabcxyzzzzpqrrrabbbxyyyyyypqAzz
+No match
+    aaaabcxyzzzzpqrrrabbbxyyypqAzz
+No match
+    aaabcxyzpqrrrabbxyyyypqqqqqqqAzz
+No match
+
+/^(abc){1,2}zz/
+    abczz
+ 0: abczz
+ 1: abc
+    abcabczz
+ 0: abcabczz
+ 1: abc
+    *** Failers
+No match
+    zz
+No match
+    abcabcabczz
+No match
+    >>abczz
+No match
+
+/^(b+|a){1,2}c/
+    bc
+ 0: bc
+ 1: b
+    bbc
+ 0: bbc
+ 1: bb
+    bbbc
+ 0: bbbc
+ 1: bbb
+    bac
+ 0: bac
+ 1: a
+    bbac
+ 0: bbac
+ 1: a
+    aac
+ 0: aac
+ 1: a
+    abbbbbbbbbbbc
+ 0: abbbbbbbbbbbc
+ 1: bbbbbbbbbbb
+    bbbbbbbbbbbac
+ 0: bbbbbbbbbbbac
+ 1: a
+    *** Failers
+No match
+    aaac
+No match
+    abbbbbbbbbbbac
+No match
+
+/^[]cde]/
+    ]thing
+ 0: ]
+    cthing
+ 0: c
+    dthing
+ 0: d
+    ething
+ 0: e
+    *** Failers
+No match
+    athing
+No match
+    fthing
+No match
+
+/^[^]cde]/
+    athing
+ 0: a
+    fthing
+ 0: f
+    *** Failers
+ 0: *
+    ]thing
+No match
+    cthing
+No match
+    dthing
+No match
+    ething
+No match
+
+/^[0-9]+$/
+    0
+ 0: 0
+    1
+ 0: 1
+    2
+ 0: 2
+    3
+ 0: 3
+    4
+ 0: 4
+    5
+ 0: 5
+    6
+ 0: 6
+    7
+ 0: 7
+    8
+ 0: 8
+    9
+ 0: 9
+    10
+ 0: 10
+    100
+ 0: 100
+    *** Failers
+No match
+    abc
+No match
+
+/^.*nter/
+    enter
+ 0: enter
+    inter
+ 0: inter
+    uponter
+ 0: uponter
+
+/^xxx[0-9]+$/
+    xxx0
+ 0: xxx0
+    xxx1234
+ 0: xxx1234
+    *** Failers
+No match
+    xxx
+No match
+
+/^.+[0-9][0-9][0-9]$/
+    x123
+ 0: x123
+    xx123
+ 0: xx123
+    123456
+ 0: 123456
+    *** Failers
+No match
+    123
+No match
+    x1234
+ 0: x1234
+
+/^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/
+    abc!pqr=apquxz.ixr.zzz.ac.uk
+ 0: abc!pqr=apquxz.ixr.zzz.ac.uk
+ 1: abc
+ 2: pqr
+    *** Failers
+No match
+    !pqr=apquxz.ixr.zzz.ac.uk
+No match
+    abc!=apquxz.ixr.zzz.ac.uk
+No match
+    abc!pqr=apquxz:ixr.zzz.ac.uk
+No match
+    abc!pqr=apquxz.ixr.zzz.ac.ukk
+No match
+
+/:/
+    Well, we need a colon: somewhere
+ 0: :
+    *** Fail if we don't
+No match
+
+/([0-9a-f:]+)$/i
+    0abc
+ 0: 0abc
+ 1: 0abc
+    abc
+ 0: abc
+ 1: abc
+    fed
+ 0: fed
+ 1: fed
+    E
+ 0: E
+ 1: E
+    ::
+ 0: ::
+ 1: ::
+    5f03:12C0::932e
+ 0: 5f03:12C0::932e
+ 1: 5f03:12C0::932e
+    fed def
+ 0: def
+ 1: def
+    Any old stuff
+ 0: ff
+ 1: ff
+    *** Failers
+No match
+    0zzz
+No match
+    gzzz
+No match
+    Any old rubbish
+No match
+
+/^.*\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$/
+    .1.2.3
+ 0: .1.2.3
+ 1: 1
+ 2: 2
+ 3: 3
+    A.12.123.0
+ 0: A.12.123.0
+ 1: 12
+ 2: 123
+ 3: 0
+    *** Failers
+No match
+    .1.2.3333
+No match
+    1.2.3
+No match
+    1234.2.3
+No match
+
+/^([0-9]+)\s+IN\s+SOA\s+(\S+)\s+(\S+)\s*\(\s*$/
+    1 IN SOA non-sp1 non-sp2(
+ 0: 1 IN SOA non-sp1 non-sp2(
+ 1: 1
+ 2: non-sp1
+ 3: non-sp2
+    1    IN    SOA    non-sp1    non-sp2   (
+ 0: 1    IN    SOA    non-sp1    non-sp2   (
+ 1: 1
+ 2: non-sp1
+ 3: non-sp2
+    *** Failers
+No match
+    1IN SOA non-sp1 non-sp2(
+No match
+
+/^[a-zA-Z0-9][a-zA-Z0-9-]*(\.[a-zA-Z0-9][a-zA-z0-9-]*)*\.$/
+    a.
+ 0: a.
+    Z.
+ 0: Z.
+    2.
+ 0: 2.
+    ab-c.pq-r.
+ 0: ab-c.pq-r.
+ 1: .pq-r
+    sxk.zzz.ac.uk.
+ 0: sxk.zzz.ac.uk.
+ 1: .uk
+    x-.y-.
+ 0: x-.y-.
+ 1: .y-
+    *** Failers
+No match
+    -abc.peq.
+No match
+
+/^\*\.[a-z]([a-z0-9-]*[a-z0-9]+)?(\.[a-z]([a-z0-9-]*[a-z0-9]+)?)*$/
+    *.a
+ 0: *.a
+    *.b0-a
+ 0: *.b0-a
+ 1: 0-a
+    *.c3-b.c
+ 0: *.c3-b.c
+ 1: 3-b
+ 2: .c
+    *.c-a.b-c
+ 0: *.c-a.b-c
+ 1: -a
+ 2: .b-c
+ 3: -c
+    *** Failers
+No match
+    *.0
+No match
+    *.a-
+No match
+    *.a-b.c-
+No match
+    *.c-a.0-c
+No match
+
+/^[0-9a-f](\.[0-9a-f])*$/i
+    a.b.c.d
+ 0: a.b.c.d
+ 1: .d
+    A.B.C.D
+ 0: A.B.C.D
+ 1: .D
+    a.b.c.1.2.3.C
+ 0: a.b.c.1.2.3.C
+ 1: .C
+
+/^".*"\s*(;.*)?$/
+    "1234"
+ 0: "1234"
+    "abcd" ;
+ 0: "abcd" ;
+ 1: ;
+    "" ; rhubarb
+ 0: "" ; rhubarb
+ 1: ; rhubarb
+    *** Failers
+No match
+    "1234" : things
+No match
+
+/^(a(b(c)))(d(e(f)))(h(i(j)))(k(l(m)))$/
+    abcdefhijklm
+ 0: abcdefhijklm
+ 1: abc
+ 2: bc
+ 3: c
+ 4: def
+ 5: ef
+ 6: f
+ 7: hij
+ 8: ij
+ 9: j
+10: klm
+11: lm
+12: m
+
+/^a*\w/
+    z
+ 0: z
+    az
+ 0: az
+    aaaz
+ 0: aaaz
+    a
+ 0: a
+    aa
+ 0: aa
+    aaaa
+ 0: aaaa
+    a+
+ 0: a
+    aa+
+ 0: aa
+
+/^a+\w/
+    az
+ 0: az
+    aaaz
+ 0: aaaz
+    aa
+ 0: aa
+    aaaa
+ 0: aaaa
+    aa+
+ 0: aa
+
+/^[0-9]{8}\w{2,}/
+    1234567890
+ 0: 1234567890
+    12345678ab
+ 0: 12345678ab
+    12345678__
+ 0: 12345678__
+    *** Failers
+No match
+    1234567
+No match
+
+/^[aeiou0-9]{4,5}$/
+    uoie
+ 0: uoie
+    1234
+ 0: 1234
+    12345
+ 0: 12345
+    aaaaa
+ 0: aaaaa
+    *** Failers
+No match
+    123456
+No match
+
+/\`(abc|def)=(\1){2,3}\'/
+    abc=abcabc
+ 0: abc=abcabc
+ 1: abc
+ 2: abc
+    def=defdefdef
+ 0: def=defdefdef
+ 1: def
+ 2: def
+    *** Failers
+No match
+    abc=defdef
+No match
+
+/(cat(a(ract|tonic)|erpillar)) \1()2(3)/
+    cataract cataract23
+ 0: cataract cataract23
+ 1: cataract
+ 2: aract
+ 3: ract
+ 4: 
+ 5: 3
+    catatonic catatonic23
+ 0: catatonic catatonic23
+ 1: catatonic
+ 2: atonic
+ 3: tonic
+ 4: 
+ 5: 3
+    caterpillar caterpillar23
+ 0: caterpillar caterpillar23
+ 1: caterpillar
+ 2: erpillar
+ 3: <unset>
+ 4: 
+ 5: 3
+
+
+/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
+    From abcd  Mon Sep 01 12:33:02 1997
+ 0: From abcd  Mon Sep 01 12:33
+ 1: abcd
+
+/^From\s+\S+\s+([a-zA-Z]{3}\s+){2}[0-9]{1,2}\s+[0-9][0-9]:[0-9][0-9]/
+    From abcd  Mon Sep 01 12:33:02 1997
+ 0: From abcd  Mon Sep 01 12:33
+ 1: Sep 
+    From abcd  Mon Sep  1 12:33:02 1997
+ 0: From abcd  Mon Sep  1 12:33
+ 1: Sep  
+    *** Failers
+No match
+    From abcd  Sep 01 12:33:02 1997
+No match
+
+/^(a)\1{2,3}(.)/
+    aaab
+ 0: aaab
+ 1: a
+ 2: b
+    aaaab
+ 0: aaaab
+ 1: a
+ 2: b
+    aaaaab
+ 0: aaaaa
+ 1: a
+ 2: a
+    aaaaaab
+ 0: aaaaa
+ 1: a
+ 2: a
+
+/^[ab]{1,3}(ab*|b)/
+    aabbbbb
+ 0: aabbbbb
+ 1: abbbbb
+
+/^(cow|)\1(bell)/
+    cowcowbell
+ 0: cowcowbell
+ 1: cow
+ 2: bell
+    bell
+ 0: bell
+ 1: 
+ 2: bell
+    *** Failers
+No match
+    cowbell
+No match
+
+/^(a|)\1+b/
+    aab
+ 0: aab
+ 1: a
+    aaaab
+ 0: aaaab
+ 1: a
+    b
+ 0: b
+ 1: 
+    *** Failers
+No match
+    ab
+No match
+
+/^(a|)\1{2}b/
+    aaab
+ 0: aaab
+ 1: a
+    b
+ 0: b
+ 1: 
+    *** Failers
+No match
+    ab
+No match
+    aab
+No match
+    aaaab
+No match
+
+/^(a|)\1{2,3}b/
+    aaab
+ 0: aaab
+ 1: a
+    aaaab
+ 0: aaaab
+ 1: a
+    b
+ 0: b
+ 1: 
+    *** Failers
+No match
+    ab
+No match
+    aab
+No match
+    aaaaab
+No match
+
+/ab{1,3}bc/
+    abbbbc
+ 0: abbbbc
+    abbbc
+ 0: abbbc
+    abbc
+ 0: abbc
+    *** Failers
+No match
+    abc
+No match
+    abbbbbc
+No match
+
+/([^.]*)\.([^:]*):[T ]+(.*)/
+    track1.title:TBlah blah blah
+ 0: track1.title:TBlah blah blah
+ 1: track1
+ 2: title
+ 3: Blah blah blah
+
+/([^.]*)\.([^:]*):[T ]+(.*)/i
+    track1.title:TBlah blah blah
+ 0: track1.title:TBlah blah blah
+ 1: track1
+ 2: title
+ 3: Blah blah blah
+
+/([^.]*)\.([^:]*):[t ]+(.*)/i
+    track1.title:TBlah blah blah
+ 0: track1.title:TBlah blah blah
+ 1: track1
+ 2: title
+ 3: Blah blah blah
+
+/^abc$/
+    abc
+ 0: abc
+    *** Failers
+No match
+
+/[-az]+/
+    az-
+ 0: az-
+    *** Failers
+ 0: a
+    b
+No match
+
+/[az-]+/
+    za-
+ 0: za-
+    *** Failers
+ 0: a
+    b
+No match
+
+/[a-z]+/
+    abcdxyz
+ 0: abcdxyz
+
+/[0-9-]+/
+    12-34
+ 0: 12-34
+    *** Failers
+No match
+    aaa
+No match
+
+/(abc)\1/i
+    abcabc
+ 0: abcabc
+ 1: abc
+    ABCabc
+ 0: ABCabc
+ 1: ABC
+    abcABC
+ 0: abcABC
+ 1: abc
+
+/a{0}bc/
+    bc
+ 0: bc
+
+/^([^a])([^b])([^c]*)([^d]{3,4})/
+    baNOTccccd
+ 0: baNOTcccc
+ 1: b
+ 2: a
+ 3: NOT
+ 4: cccc
+    baNOTcccd
+ 0: baNOTccc
+ 1: b
+ 2: a
+ 3: NOT
+ 4: ccc
+    baNOTccd
+ 0: baNOTcc
+ 1: b
+ 2: a
+ 3: NO
+ 4: Tcc
+    bacccd
+ 0: baccc
+ 1: b
+ 2: a
+ 3: 
+ 4: ccc
+    *** Failers
+ 0: *** Failers
+ 1: *
+ 2: *
+ 3: * Fail
+ 4: ers
+    anything
+No match
+    baccd
+No match
+
+/[^a]/
+    Abc
+ 0: A
+
+/[^a]/i
+    Abc 
+ 0: b
+
+/[^a]+/
+    AAAaAbc
+ 0: AAA
+
+/[^a]+/i
+    AAAaAbc
+ 0: bc
+
+/[^k]$/
+    abc
+ 0: c
+    *** Failers
+ 0: s
+    abk
+No match
+
+/[^k]{2,3}$/
+    abc
+ 0: abc
+    kbc
+ 0: bc
+    kabc
+ 0: abc
+    *** Failers
+ 0: ers
+    abk
+No match
+    akb
+No match
+    akk 
+No match
+
+/^[0-9]{8,}@.+[^k]$/
+    12345678@a.b.c.d
+ 0: 12345678@a.b.c.d
+    123456789@x.y.z
+ 0: 123456789@x.y.z
+    *** Failers
+No match
+    12345678@x.y.uk
+No match
+    1234567@a.b.c.d       
+No match
+
+/(a)\1{8,}/
+    aaaaaaaaa
+ 0: aaaaaaaaa
+ 1: a
+    aaaaaaaaaa
+ 0: aaaaaaaaaa
+ 1: a
+    *** Failers
+No match
+    aaaaaaa   
+No match
+
+/[^a]/
+    aaaabcd
+ 0: b
+    aaAabcd 
+ 0: A
+
+/[^a]/i
+    aaaabcd
+ 0: b
+    aaAabcd 
+ 0: b
+
+/[^az]/
+    aaaabcd
+ 0: b
+    aaAabcd 
+ 0: A
+
+/[^az]/i
+    aaaabcd
+ 0: b
+    aaAabcd 
+ 0: b
+
+/P[^*]TAIRE[^*]{1,6}LL/
+    xxxxxxxxxxxPSTAIREISLLxxxxxxxxx
+ 0: PSTAIREISLL
+
+/P[^*]TAIRE[^*]{1,}LL/
+    xxxxxxxxxxxPSTAIREISLLxxxxxxxxx
+ 0: PSTAIREISLL
+
+/(\.[0-9][0-9][1-9]?)[0-9]+/
+    1.230003938
+ 0: .230003938
+ 1: .23
+    1.875000282   
+ 0: .875000282
+ 1: .875
+    1.235  
+ 0: .235
+ 1: .23
+                  
+/\b(foo)\s+(\w+)/i
+    Food is on the foo table
+ 0: foo table
+ 1: foo
+ 2: table
+    
+/foo(.*)bar/
+    The food is under the bar in the barn.
+ 0: food is under the bar in the bar
+ 1: d is under the bar in the 
+    
+/(.*)([0-9]*)/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 53147
+ 2: 
+    
+/(.*)([0-9]+)/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 5314
+ 2: 7
+
+/(.*)([0-9]+)$/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 5314
+ 2: 7
+
+/(.*)\b([0-9]+)$/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 
+ 2: 53147
+
+/(.*[^0-9])([0-9]+)$/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 
+ 2: 53147
+
+/[[:digit:]][[:digit:]]\/[[:digit:]][[:digit:]]\/[[:digit:]][[:digit:]][[:digit:]][[:digit:]]/
+    01/01/2000
+ 0: 01/01/2000
+
+/^(a){0,0}/
+    bcd
+ 0: 
+    abc
+ 0: 
+    aab     
+ 0: 
+
+/^(a){0,1}/
+    bcd
+ 0: 
+    abc
+ 0: a
+ 1: a
+    aab  
+ 0: a
+ 1: a
+
+/^(a){0,2}/
+    bcd
+ 0: 
+    abc
+ 0: a
+ 1: a
+    aab  
+ 0: aa
+ 1: a
+
+/^(a){0,3}/
+    bcd
+ 0: 
+    abc
+ 0: a
+ 1: a
+    aab
+ 0: aa
+ 1: a
+    aaa   
+ 0: aaa
+ 1: a
+
+/^(a){0,}/
+    bcd
+ 0: 
+    abc
+ 0: a
+ 1: a
+    aab
+ 0: aa
+ 1: a
+    aaa
+ 0: aaa
+ 1: a
+    aaaaaaaa    
+ 0: aaaaaaaa
+ 1: a
+
+/^(a){1,1}/
+    bcd
+No match
+    abc
+ 0: a
+ 1: a
+    aab  
+ 0: a
+ 1: a
+
+/^(a){1,2}/
+    bcd
+No match
+    abc
+ 0: a
+ 1: a
+    aab  
+ 0: aa
+ 1: a
+
+/^(a){1,3}/
+    bcd
+No match
+    abc
+ 0: a
+ 1: a
+    aab
+ 0: aa
+ 1: a
+    aaa   
+ 0: aaa
+ 1: a
+
+/^(a){1,}/
+    bcd
+No match
+    abc
+ 0: a
+ 1: a
+    aab
+ 0: aa
+ 1: a
+    aaa
+ 0: aaa
+ 1: a
+    aaaaaaaa    
+ 0: aaaaaaaa
+ 1: a
+
+/^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/
+    123456654321
+ 0: 123456654321
+
+/^[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]]/
+    123456654321 
+ 0: 123456654321
+
+/^[abc]{12}/
+    abcabcabcabc
+ 0: abcabcabcabc
+    
+/^[a-c]{12}/
+    abcabcabcabc
+ 0: abcabcabcabc
+    
+/^(a|b|c){12}/
+    abcabcabcabc 
+ 0: abcabcabcabc
+ 1: c
+
+/^[abcdefghijklmnopqrstuvwxy0123456789]/
+    n
+ 0: n
+    *** Failers 
+No match
+    z 
+No match
+
+/abcde{0,0}/
+    abcd
+ 0: abcd
+    *** Failers
+No match
+    abce  
+No match
+
+/ab[cd]{0,0}e/
+    abe
+ 0: abe
+    *** Failers
+No match
+    abcde 
+No match
+    
+/ab(c){0,0}d/
+    abd
+ 0: abd
+    *** Failers
+No match
+    abcd   
+No match
+
+/a(b*)/
+    a
+ 0: a
+ 1: 
+    ab
+ 0: ab
+ 1: b
+    abbbb
+ 0: abbbb
+ 1: bbbb
+    *** Failers
+ 0: a
+ 1: 
+    bbbbb    
+No match
+    
+/ab[0-9]{0}e/
+    abe
+ 0: abe
+    *** Failers
+No match
+    ab1e   
+No match
+    
+/(A|B)*CD/
+    CD 
+ 0: CD
+
+/(AB)*\1/
+    ABABAB
+ 0: ABABAB
+ 1: AB
+    
+/([0-9]+)(\w)/
+    12345a
+ 0: 12345a
+ 1: 12345
+ 2: a
+    12345+ 
+ 0: 12345
+ 1: 1234
+ 2: 5
+
+/(abc|)+/
+    abc
+ 0: abc
+ 1: abc
+    abcabc
+ 0: abcabc
+ 1: abc
+    abcabcabc
+ 0: abcabcabc
+ 1: abc
+    xyz      
+ 0: 
+ 1: 
+
+/([a]*)*/
+    a
+ 0: a
+ 1: a
+    aaaaa 
+ 0: aaaaa
+ 1: aaaaa
+
+/([ab]*)*/
+    a
+ 0: a
+ 1: a
+    b
+ 0: b
+ 1: b
+    ababab
+ 0: ababab
+ 1: ababab
+    aaaabcde
+ 0: aaaab
+ 1: aaaab
+    bbbb    
+ 0: bbbb
+ 1: bbbb
+
+/([^a]*)*/
+    b
+ 0: b
+ 1: b
+    bbbb
+ 0: bbbb
+ 1: bbbb
+    aaa   
+ 0: 
+
+/([^ab]*)*/
+    cccc
+ 0: cccc
+ 1: cccc
+    abab  
+ 0: 
+
+/abc/
+    abc
+ 0: abc
+    xabcy
+ 0: abc
+    ababc
+ 0: abc
+    *** Failers
+No match
+    xbc
+No match
+    axc
+No match
+    abx
+No match
+
+/ab*c/
+    abc
+ 0: abc
+
+/ab*bc/
+    abc
+ 0: abc
+    abbc
+ 0: abbc
+    abbbbc
+ 0: abbbbc
+
+/.{1}/
+    abbbbc
+ 0: a
+
+/.{3,4}/
+    abbbbc
+ 0: abbb
+
+/ab{0,}bc/
+    abbbbc
+ 0: abbbbc
+
+/ab+bc/
+    abbc
+ 0: abbc
+    *** Failers
+No match
+    abc
+No match
+    abq
+No match
+
+/ab+bc/
+    abbbbc
+ 0: abbbbc
+
+/ab{1,}bc/
+    abbbbc
+ 0: abbbbc
+
+/ab{1,3}bc/
+    abbbbc
+ 0: abbbbc
+
+/ab{3,4}bc/
+    abbbbc
+ 0: abbbbc
+
+/ab{4,5}bc/
+    *** Failers
+No match
+    abq
+No match
+    abbbbc
+No match
+
+/ab?bc/
+    abbc
+ 0: abbc
+    abc
+ 0: abc
+
+/ab{0,1}bc/
+    abc
+ 0: abc
+
+/ab?c/
+    abc
+ 0: abc
+
+/ab{0,1}c/
+    abc
+ 0: abc
+
+/^abc$/
+    abc
+ 0: abc
+    *** Failers
+No match
+    abbbbc
+No match
+    abcc
+No match
+
+/^abc/
+    abcc
+ 0: abc
+
+/abc$/
+    aabc
+ 0: abc
+    *** Failers
+No match
+    aabc
+ 0: abc
+    aabcd
+No match
+
+/^/
+    abc
+ 0: 
+
+/$/
+    abc
+ 0: 
+
+/a.c/
+    abc
+ 0: abc
+    axc
+ 0: axc
+
+/a.*c/
+    axyzc
+ 0: axyzc
+
+/a[bc]d/
+    abd
+ 0: abd
+    *** Failers
+No match
+    axyzd
+No match
+    abc
+No match
+
+/a[b-d]e/
+    ace
+ 0: ace
+
+/a[b-d]/
+    aac
+ 0: ac
+
+/a[-b]/
+    a-
+ 0: a-
+
+/a[b-]/
+    a-
+ 0: a-
+
+/a[]]b/
+    a]b
+ 0: a]b
+
+/a[^bc]d/
+    aed
+ 0: aed
+    *** Failers
+No match
+    abd
+No match
+    abd
+No match
+
+/a[^-b]c/
+    adc
+ 0: adc
+
+/a[^]b]c/
+    adc
+ 0: adc
+    *** Failers
+No match
+    a-c
+ 0: a-c
+    a]c
+No match
+
+/\ba\b/
+    a-
+ 0: a
+    -a
+ 0: a
+    -a-
+ 0: a
+
+/\by\b/
+    *** Failers
+No match
+    xy
+No match
+    yz
+No match
+    xyz
+No match
+
+/\Ba\B/
+    *** Failers
+ 0: a
+    a-
+No match
+    -a
+No match
+    -a-
+No match
+
+/\By\b/
+    xy
+ 0: y
+
+/\by\B/
+    yz
+ 0: y
+
+/\By\B/
+    xyz
+ 0: y
+
+/\w/
+    a
+ 0: a
+
+/\W/
+    -
+ 0: -
+    *** Failers
+ 0: *
+    -
+ 0: -
+    a
+No match
+
+/a\sb/
+    a b
+ 0: a b
+
+/a\Sb/
+    a-b
+ 0: a-b
+    *** Failers
+No match
+    a-b
+ 0: a-b
+    a b
+No match
+
+/[0-9]/
+    1
+ 0: 1
+
+/[^0-9]/
+    -
+ 0: -
+    *** Failers
+ 0: *
+    -
+ 0: -
+    1
+No match
+
+/ab|cd/
+    abc
+ 0: ab
+    abcd
+ 0: ab
+
+/()ef/
+    def
+ 0: ef
+ 1: 
+
+/a\(b/
+    a(b
+ 0: a(b
+
+/a\(*b/
+    ab
+ 0: ab
+    a((b
+ 0: a((b
+
+/((a))/
+    abc
+ 0: a
+ 1: a
+ 2: a
+
+/(a)b(c)/
+    abc
+ 0: abc
+ 1: a
+ 2: c
+
+/a+b+c/
+    aabbabc
+ 0: abc
+
+/a{1,}b{1,}c/
+    aabbabc
+ 0: abc
+
+/(a+|b)*/
+    ab
+ 0: ab
+ 1: b
+
+/(a+|b){0,}/
+    ab
+ 0: ab
+ 1: b
+
+/(a+|b)+/
+    ab
+ 0: ab
+ 1: b
+
+/(a+|b){1,}/
+    ab
+ 0: ab
+ 1: b
+
+/(a+|b)?/
+    ab
+ 0: a
+ 1: a
+
+/(a+|b){0,1}/
+    ab
+ 0: a
+ 1: a
+
+/[^ab]*/
+    cde
+ 0: cde
+
+/abc/
+    *** Failers
+No match
+    b
+No match
+    
+
+/a*/
+    
+
+/([abc])*d/
+    abbbcd
+ 0: abbbcd
+ 1: c
+
+/([abc])*bcd/
+    abcd
+ 0: abcd
+ 1: a
+
+/a|b|c|d|e/
+    e
+ 0: e
+
+/(a|b|c|d|e)f/
+    ef
+ 0: ef
+ 1: e
+
+/abcd*efg/
+    abcdefg
+ 0: abcdefg
+
+/ab*/
+    xabyabbbz
+ 0: ab
+    xayabbbz
+ 0: a
+
+/(ab|cd)e/
+    abcde
+ 0: cde
+ 1: cd
+
+/[abhgefdc]ij/
+    hij
+ 0: hij
+
+/(abc|)ef/
+    abcdef
+ 0: ef
+ 1: 
+
+/(a|b)c*d/
+    abcd
+ 0: bcd
+ 1: b
+
+/(ab|ab*)bc/
+    abc
+ 0: abc
+ 1: a
+
+/a([bc]*)c*/
+    abc
+ 0: abc
+ 1: bc
+
+/a([bc]*)(c*d)/
+    abcd
+ 0: abcd
+ 1: bc
+ 2: d
+
+/a([bc]+)(c*d)/
+    abcd
+ 0: abcd
+ 1: bc
+ 2: d
+
+/a([bc]*)(c+d)/
+    abcd
+ 0: abcd
+ 1: b
+ 2: cd
+
+/a[bcd]*dcdcde/
+    adcdcde
+ 0: adcdcde
+
+/a[bcd]+dcdcde/
+    *** Failers
+No match
+    abcde
+No match
+    adcdcde
+No match
+
+/(ab|a)b*c/
+    abc
+ 0: abc
+ 1: ab
+
+/((a)(b)c)(d)/
+    abcd
+ 0: abcd
+ 1: abc
+ 2: a
+ 3: b
+ 4: d
+
+/[a-zA-Z_][a-zA-Z0-9_]*/
+    alpha
+ 0: alpha
+
+/^a(bc+|b[eh])g|.h$/
+    abh
+ 0: bh
+
+/(bc+d$|ef*g.|h?i(j|k))/
+    effgz
+ 0: effgz
+ 1: effgz
+    ij
+ 0: ij
+ 1: ij
+ 2: j
+    reffgz
+ 0: effgz
+ 1: effgz
+    *** Failers
+No match
+    effg
+No match
+    bcdd
+No match
+
+/((((((((((a))))))))))/
+    a
+ 0: a
+ 1: a
+ 2: a
+ 3: a
+ 4: a
+ 5: a
+ 6: a
+ 7: a
+ 8: a
+ 9: a
+10: a
+
+/((((((((((a))))))))))\9/
+    aa
+ 0: aa
+ 1: a
+ 2: a
+ 3: a
+ 4: a
+ 5: a
+ 6: a
+ 7: a
+ 8: a
+ 9: a
+10: a
+
+/(((((((((a)))))))))/
+    a
+ 0: a
+ 1: a
+ 2: a
+ 3: a
+ 4: a
+ 5: a
+ 6: a
+ 7: a
+ 8: a
+ 9: a
+
+/multiple words of text/
+    *** Failers
+No match
+    aa
+No match
+    uh-uh
+No match
+
+/multiple words/
+    multiple words, yeah
+ 0: multiple words
+
+/(.*)c(.*)/
+    abcde
+ 0: abcde
+ 1: ab
+ 2: de
+
+/\((.*), (.*)\)/
+    (a, b)
+ 0: (a, b)
+ 1: a
+ 2: b
+
+/abcd/
+    abcd
+ 0: abcd
+
+/a(bc)d/
+    abcd
+ 0: abcd
+ 1: bc
+
+/a[-]?c/
+    ac
+ 0: ac
+
+/(abc)\1/
+    abcabc
+ 0: abcabc
+ 1: abc
+
+/([a-c]*)\1/
+    abcabc
+ 0: abcabc
+ 1: abc
+
+/(a)|\1/
+    a
+ 0: a
+ 1: a
+    *** Failers
+ 0: a
+ 1: a
+    ab
+ 0: a
+ 1: a
+    x
+No match
+
+/abc/i
+    ABC
+ 0: ABC
+    XABCY
+ 0: ABC
+    ABABC
+ 0: ABC
+    *** Failers
+No match
+    aaxabxbaxbbx
+No match
+    XBC
+No match
+    AXC
+No match
+    ABX
+No match
+
+/ab*c/i
+    ABC
+ 0: ABC
+
+/ab*bc/i
+    ABC
+ 0: ABC
+    ABBC
+ 0: ABBC
+
+/ab+bc/i
+    *** Failers
+No match
+    ABC
+No match
+    ABQ
+No match
+
+/ab+bc/i
+    ABBBBC
+ 0: ABBBBC
+
+/^abc$/i
+    ABC
+ 0: ABC
+    *** Failers
+No match
+    ABBBBC
+No match
+    ABCC
+No match
+
+/^abc/i
+    ABCC
+ 0: ABC
+
+/abc$/i
+    AABC
+ 0: ABC
+
+/^/i
+    ABC
+ 0: 
+
+/$/i
+    ABC
+ 0: 
+
+/a.c/i
+    ABC
+ 0: ABC
+    AXC
+ 0: AXC
+
+/a.*c/i
+    *** Failers
+No match
+    AABC
+ 0: AABC
+    AXYZD
+No match
+
+/a[bc]d/i
+    ABD
+ 0: ABD
+
+/a[b-d]e/i
+    ACE
+ 0: ACE
+    *** Failers
+No match
+    ABC
+No match
+    ABD
+No match
+
+/a[b-d]/i
+    AAC
+ 0: AC
+
+/a[-b]/i
+    A-
+ 0: A-
+
+/a[b-]/i
+    A-
+ 0: A-
+
+/a[]]b/i
+    A]B
+ 0: A]B
+
+/a[^bc]d/i
+    AED
+ 0: AED
+
+/a[^-b]c/i
+    ADC
+ 0: ADC
+    *** Failers
+No match
+    ABD
+No match
+    A-C
+No match
+
+/a[^]b]c/i
+    ADC
+ 0: ADC
+
+/ab|cd/i
+    ABC
+ 0: AB
+    ABCD
+ 0: AB
+
+/()ef/i
+    DEF
+ 0: EF
+ 1: 
+
+/$b/i
+    *** Failers
+No match
+    A]C
+No match
+    B
+No match
+
+/a\(b/i
+    A(B
+ 0: A(B
+
+/a\(*b/i
+    AB
+ 0: AB
+    A((B
+ 0: A((B
+
+/((a))/i
+    ABC
+ 0: A
+ 1: A
+ 2: A
+
+/(a)b(c)/i
+    ABC
+ 0: ABC
+ 1: A
+ 2: C
+
+/a+b+c/i
+    AABBABC
+ 0: ABC
+
+/a{1,}b{1,}c/i
+    AABBABC
+ 0: ABC
+
+/(a+|b)*/i
+    AB
+ 0: AB
+ 1: B
+
+/(a+|b){0,}/i
+    AB
+ 0: AB
+ 1: B
+
+/(a+|b)+/i
+    AB
+ 0: AB
+ 1: B
+
+/(a+|b){1,}/i
+    AB
+ 0: AB
+ 1: B
+
+/(a+|b)?/i
+    AB
+ 0: A
+ 1: A
+
+/(a+|b){0,1}/i
+    AB
+ 0: A
+ 1: A
+
+/[^ab]*/i
+    CDE
+ 0: CDE
+
+/([abc])*d/i
+    ABBBCD
+ 0: ABBBCD
+ 1: C
+
+/([abc])*bcd/i
+    ABCD
+ 0: ABCD
+ 1: A
+
+/a|b|c|d|e/i
+    E
+ 0: E
+
+/(a|b|c|d|e)f/i
+    EF
+ 0: EF
+ 1: E
+
+/abcd*efg/i
+    ABCDEFG
+ 0: ABCDEFG
+
+/ab*/i
+    XABYABBBZ
+ 0: AB
+    XAYABBBZ
+ 0: A
+
+/(ab|cd)e/i
+    ABCDE
+ 0: CDE
+ 1: CD
+
+/[abhgefdc]ij/i
+    HIJ
+ 0: HIJ
+
+/^(ab|cd)e/i
+    ABCDE
+No match
+
+/(abc|)ef/i
+    ABCDEF
+ 0: EF
+ 1: 
+
+/(a|b)c*d/i
+    ABCD
+ 0: BCD
+ 1: B
+
+/(ab|ab*)bc/i
+    ABC
+ 0: ABC
+ 1: A
+
+/a([bc]*)c*/i
+    ABC
+ 0: ABC
+ 1: BC
+
+/a([bc]*)(c*d)/i
+    ABCD
+ 0: ABCD
+ 1: BC
+ 2: D
+
+/a([bc]+)(c*d)/i
+    ABCD
+ 0: ABCD
+ 1: BC
+ 2: D
+
+/a([bc]*)(c+d)/i
+    ABCD
+ 0: ABCD
+ 1: B
+ 2: CD
+
+/a[bcd]*dcdcde/i
+    ADCDCDE
+ 0: ADCDCDE
+
+/a[bcd]+dcdcde/i
+
+/(ab|a)b*c/i
+    ABC
+ 0: ABC
+ 1: AB
+
+/((a)(b)c)(d)/i
+    ABCD
+ 0: ABCD
+ 1: ABC
+ 2: A
+ 3: B
+ 4: D
+
+/[a-zA-Z_][a-zA-Z0-9_]*/i
+    ALPHA
+ 0: ALPHA
+
+/^a(bc+|b[eh])g|.h$/i
+    ABH
+ 0: BH
+
+/(bc+d$|ef*g.|h?i(j|k))/i
+    EFFGZ
+ 0: EFFGZ
+ 1: EFFGZ
+    IJ
+ 0: IJ
+ 1: IJ
+ 2: J
+    REFFGZ
+ 0: EFFGZ
+ 1: EFFGZ
+    *** Failers
+No match
+    ADCDCDE
+No match
+    EFFG
+No match
+    BCDD
+No match
+
+/((((((((((a))))))))))/i
+    A
+ 0: A
+ 1: A
+ 2: A
+ 3: A
+ 4: A
+ 5: A
+ 6: A
+ 7: A
+ 8: A
+ 9: A
+10: A
+
+/((((((((((a))))))))))\9/i
+    AA
+ 0: AA
+ 1: A
+ 2: A
+ 3: A
+ 4: A
+ 5: A
+ 6: A
+ 7: A
+ 8: A
+ 9: A
+10: A
+
+/(((((((((a)))))))))/i
+    A
+ 0: A
+ 1: A
+ 2: A
+ 3: A
+ 4: A
+ 5: A
+ 6: A
+ 7: A
+ 8: A
+ 9: A
+
+/multiple words of text/i
+    *** Failers
+No match
+    AA
+No match
+    UH-UH
+No match
+
+/multiple words/i
+    MULTIPLE WORDS, YEAH
+ 0: MULTIPLE WORDS
+
+/(.*)c(.*)/i
+    ABCDE
+ 0: ABCDE
+ 1: AB
+ 2: DE
+
+/\((.*), (.*)\)/i
+    (A, B)
+ 0: (A, B)
+ 1: A
+ 2: B
+
+/abcd/i
+    ABCD
+ 0: ABCD
+
+/a(bc)d/i
+    ABCD
+ 0: ABCD
+ 1: BC
+
+/a[-]?c/i
+    AC
+ 0: AC
+
+/(abc)\1/i
+    ABCABC
+ 0: ABCABC
+ 1: ABC
+
+/([a-c]*)\1/i
+    ABCABC
+ 0: ABCABC
+ 1: ABC
+
+/((foo)|(bar))*/
+    foobar
+ 0: foobar
+ 1: bar
+ 2: foo
+ 3: bar
+
+/^(.+)?B/
+    AB
+ 0: AB
+ 1: A
+
+/^([^a-z])|(\^)$/
+    .
+ 0: .
+ 1: .
+
+/^[<>]&/
+    <&OUT
+ 0: <&
+
+/^(){3,5}/
+    abc
+ 0: 
+ 1: 
+
+/^(a+)*ax/
+    aax
+ 0: aax
+ 1: a
+
+/^((a|b)+)*ax/
+    aax
+ 0: aax
+ 1: a
+ 2: a
+
+/^((a|bc)+)*ax/
+    aax
+ 0: aax
+ 1: a
+ 2: a
+
+/(a|x)*ab/
+    cab
+ 0: ab
+
+/(a)*ab/
+    cab
+ 0: ab
+
+/(ab)[0-9]\1/i
+    Ab4ab
+ 0: Ab4ab
+ 1: Ab
+    ab4Ab
+ 0: ab4Ab
+ 1: ab
+
+/foo\w*[0-9]{4}baz/
+    foobar1234baz
+ 0: foobar1234baz
+
+/(\w+:)+/
+    one:
+ 0: one:
+ 1: one:
+
+/((\w|:)+::)?(\w+)$/
+    abcd
+ 0: abcd
+ 1: <unset>
+ 2: <unset>
+ 3: abcd
+    xy:z:::abcd
+ 0: xy:z:::abcd
+ 1: xy:z:::
+ 2: :
+ 3: abcd
+
+/^[^bcd]*(c+)/
+    aexycd
+ 0: aexyc
+ 1: c
+
+/(a*)b+/
+    caab
+ 0: aab
+ 1: aa
+
+/((\w|:)+::)?(\w+)$/
+    abcd
+ 0: abcd
+ 1: <unset>
+ 2: <unset>
+ 3: abcd
+    xy:z:::abcd
+ 0: xy:z:::abcd
+ 1: xy:z:::
+ 2: :
+ 3: abcd
+    *** Failers
+ 0: Failers
+ 1: <unset>
+ 2: <unset>
+ 3: Failers
+    abcd:
+No match
+    abcd:
+No match
+
+/^[^bcd]*(c+)/
+    aexycd
+ 0: aexyc
+ 1: c
+
+/((Z)+|A)*/
+    ZABCDEFG
+ 0: ZA
+ 1: A
+ 2: Z
+
+/(Z()|A)*/
+    ZABCDEFG
+ 0: ZA
+ 1: A
+ 2: 
+
+/(Z(())|A)*/
+    ZABCDEFG
+ 0: ZA
+ 1: A
+ 2: 
+ 3: 
+
+/(.*)[0-9]+\1/
+    abc123abc
+ 0: abc123abc
+ 1: abc
+    abc123bc 
+ 0: bc123bc
+ 1: bc
+
+/((.*))[0-9]+\1/
+    abc123abc
+ 0: abc123abc
+ 1: abc
+ 2: abc
+    abc123bc  
+ 0: bc123bc
+ 1: bc
+ 2: bc
+
+/^a{2,5}$/
+    aa
+ 0: aa
+    aaa
+ 0: aaa
+    aaaa
+ 0: aaaa
+    aaaaa
+ 0: aaaaa
+    *** Failers
+No match
+    a
+No match
+    b
+No match
+    aaaaab
+No match
+    aaaaaa
diff --git a/test/src/regex/regex-resources/PTESTS b/test/src/regex/regex-resources/PTESTS
new file mode 100644
index 0000000..02b357c
--- /dev/null
+++ b/test/src/regex/regex-resources/PTESTS
@@ -0,0 +1,341 @@
+# 2.8.2  Regular Expression General Requirement
+2¦4¦bb*¦abbbc¦
+2¦2¦bb*¦ababbbc¦
+7¦9¦A#*::¦A:A#:qA::qA#::qA##::q¦
+1¦5¦A#*::¦A##::A#::qA::qA#:q¦
+# 2.8.3.1.2  BRE Special Characters
+# GA108
+2¦2¦\.¦a.c¦
+2¦2¦\[¦a[c¦
+2¦2¦\\¦a\c¦
+2¦2¦\*¦a*c¦
+2¦2¦\^¦a^c¦
+2¦2¦\$¦a$c¦
+7¦11¦X\*Y\*8¦Y*8X*8X*Y*8¦
+# GA109
+2¦2¦[.]¦a.c¦
+2¦2¦[[]¦a[c¦
+-1¦-1¦[[]¦ac¦
+2¦2¦[\]¦a\c¦
+1¦1¦[\a]¦abc¦
+2¦2¦[\.]¦a\.c¦
+2¦2¦[\.]¦a.\c¦
+2¦2¦[*]¦a*c¦
+2¦2¦[$]¦a$c¦
+2¦2¦[X*Y8]¦7*8YX¦
+# GA110
+2¦2¦*¦a*c¦
+3¦4¦*a¦*b*a*c¦
+1¦5¦**9=¦***9=9¦
+# GA111
+1¦1¦^*¦*bc¦
+-1¦-1¦^*¦a*c¦
+-1¦-1¦^*¦^*ab¦
+1¦5¦^**9=¦***9=¦
+-1¦-1¦^*5<*9¦5<9*5<*9¦
+# GA112
+2¦3¦\(*b\)¦a*b¦
+-1¦-1¦\(*b\)¦ac¦
+1¦6¦A\(**9\)=¦A***9=79¦
+# GA113(1)
+1¦3¦\(^*ab\)¦*ab¦
+-1¦-1¦\(^*ab\)¦^*ab¦
+-1¦-1¦\(^*b\)¦a*b¦
+-1¦-1¦\(^*b\)¦^*b¦
+### GA113(2)			GNU regex implements GA113(1)
+##-1¦-1¦\(^*ab\)¦*ab¦
+##-1¦-1¦\(^*ab\)¦^*ab¦
+##1¦1¦\(^*b\)¦b¦
+##1¦3¦\(^*b\)¦^^b¦
+# GA114
+1¦3¦a^b¦a^b¦
+1¦3¦a\^b¦a^b¦
+1¦1¦^^¦^bc¦
+2¦2¦\^¦a^c¦
+1¦1¦[c^b]¦^abc¦
+1¦1¦[\^ab]¦^ab¦
+2¦2¦[\^ab]¦c\d¦
+-1¦-1¦[^^]¦^¦
+1¦3¦\(a^b\)¦a^b¦
+1¦3¦\(a\^b\)¦a^b¦
+2¦2¦\(\^\)¦a^b¦
+# GA115
+3¦3¦$$¦ab$¦
+-1¦-1¦$$¦$ab¦
+2¦3¦$c¦a$c¦
+2¦2¦[$]¦a$c¦
+1¦2¦\$a¦$a¦
+3¦3¦\$$¦ab$¦
+2¦6¦A\([34]$[34]\)B¦XA4$3BY¦
+# 2.8.3.1.3  Periods in BREs
+# GA116
+1¦1¦.¦abc¦
+-1¦-1¦.ab¦abc¦
+1¦3¦ab.¦abc¦
+1¦3¦a.b¦a,b¦
+-1¦-1¦.......¦PqRs6¦
+1¦7¦.......¦PqRs6T8¦
+# 2.8.3.2  RE Bracket Expression
+# GA118
+2¦2¦[abc]¦xbyz¦
+-1¦-1¦[abc]¦xyz¦
+2¦2¦[abc]¦xbay¦
+# GA119
+2¦2¦[^a]¦abc¦
+4¦4¦[^]cd]¦cd]ef¦
+2¦2¦[^abc]¦axyz¦
+-1¦-1¦[^abc]¦abc¦
+3¦3¦[^[.a.]b]¦abc¦
+3¦3¦[^[=a=]b]¦abc¦
+2¦2¦[^-ac]¦abcde-¦
+2¦2¦[^ac-]¦abcde-¦
+3¦3¦[^a-b]¦abcde¦
+3¦3¦[^a-bd-e]¦dec¦
+2¦2¦[^---]¦-ab¦
+16¦16¦[^a-zA-Z0-9]¦pqrstVWXYZ23579#¦
+# GA120(1)
+3¦3¦[]a]¦cd]ef¦
+1¦1¦[]-a]¦a_b¦
+3¦3¦[][.-.]-0]¦ab0-]¦
+1¦1¦[]^a-z]¦string¦
+# GA120(2)
+4¦4¦[^]cd]¦cd]ef¦
+0¦0¦[^]]*¦]]]]]]]]X¦
+0¦0¦[^]]*¦]]]]]]]]¦
+9¦9¦[^]]\{1,\}¦]]]]]]]]X¦
+-1¦-1¦[^]]\{1,\}¦]]]]]]]]¦
+# GA120(3)
+3¦3¦[c[.].]d]¦ab]cd¦
+2¦8¦[a-z]*[[.].]][A-Z]*¦Abcd]DEFg¦
+# GA121
+2¦2¦[[.a.]b]¦Abc¦
+1¦1¦[[.a.]b]¦aBc¦
+-1¦-1¦[[.a.]b]¦ABc¦
+3¦3¦[^[.a.]b]¦abc¦
+3¦3¦[][.-.]-0]¦ab0-]¦
+3¦3¦[A-[.].]c]¦ab]!¦
+# GA122
+-2¦-2¦[[.ch.]]¦abc¦
+-2¦-2¦[[.ab.][.CD.][.EF.]]¦yZabCDEFQ9¦
+# GA125
+2¦2¦[[=a=]b]¦Abc¦
+1¦1¦[[=a=]b]¦aBc¦
+-1¦-1¦[[=a=]b]¦ABc¦
+3¦3¦[^[=a=]b]¦abc¦
+# GA126
+#W the expected result for [[:alnum:]]* is 2-7 which is wrong
+0¦0¦[[:alnum:]]*¦ aB28gH¦
+2¦7¦[[:alnum:]][[:alnum:]]*¦ aB28gH¦
+#W the expected result for [^[:alnum:]]* is 2-5 which is wrong
+0¦0¦[^[:alnum:]]*¦2 	,\x7fa¦
+2¦5¦[^[:alnum:]][^[:alnum:]]*¦2 	,\x7fa¦
+#W the expected result for [[:alpha:]]* is 2-5 which is wrong
+0¦0¦[[:alpha:]]*¦ aBgH2¦
+2¦5¦[[:alpha:]][[:alpha:]]*¦ aBgH2¦
+1¦6¦[^[:alpha:]]*¦2 	8,\x7fa¦
+1¦2¦[[:blank:]]*¦ 	
\x7f¦
+1¦8¦[^[:blank:]]*¦aB28gH,\x7f ¦
+1¦2¦[[:cntrl:]]*¦	\x7f ¦
+1¦8¦[^[:cntrl:]]*¦aB2 8gh,¦
+#W the expected result for [[:digit:]]* is 2-3 which is wrong
+0¦0¦[[:digit:]]*¦a28¦
+2¦3¦[[:digit:]][[:digit:]]*¦a28¦
+1¦8¦[^[:digit:]]*¦aB 	gH,\x7f¦
+1¦7¦[[:graph:]]*¦aB28gH, ¦
+1¦3¦[^[:graph:]]*¦ 	\x7f,¦
+1¦2¦[[:lower:]]*¦agB¦
+1¦8¦[^[:lower:]]*¦B2 	8H,\x7fa¦
+1¦8¦[[:print:]]*¦aB2 8gH,	¦
+1¦2¦[^[:print:]]*¦	\x7f ¦
+#W the expected result for [[:punct:]]* is 2-2 which is wrong
+0¦0¦[[:punct:]]*¦a,2¦
+2¦3¦[[:punct:]][[:punct:]]*¦a,,2¦
+1¦9¦[^[:punct:]]*¦aB2 	8gH\x7f¦
+1¦3¦[[:space:]]*¦ 	
\x7f¦
+#W the expected result for [^[:space:]]* is 2-9 which is wrong
+0¦0¦[^[:space:]]*¦ aB28gH,\x7f	¦
+2¦9¦[^[:space:]][^[:space:]]*¦ aB28gH,\x7f	¦
+#W the expected result for [[:upper:]]* is 2-3 which is wrong
+0¦0¦[[:upper:]]*¦aBH2¦
+2¦3¦[[:upper:]][[:upper:]]*¦aBH2¦
+1¦8¦[^[:upper:]]*¦a2 	8g,\x7fB¦
+#W the expected result for [[:xdigit:]]* is 2-5 which is wrong
+0¦0¦[[:xdigit:]]*¦gaB28h¦
+2¦5¦[[:xdigit:]][[:xdigit:]]*¦gaB28h¦
+#W the expected result for [^[:xdigit:]]* is 2-7 which is wrong
+2¦7¦[^[:xdigit:]][^[:xdigit:]]*¦a 	gH,\x7f2¦
+# GA127
+-2¦-2¦[b-a]¦abc¦
+1¦1¦[a-c]¦bbccde¦
+2¦2¦[a-b]¦-bc¦
+3¦3¦[a-z0-9]¦AB0¦
+3¦3¦[^a-b]¦abcde¦
+3¦3¦[^a-bd-e]¦dec¦
+1¦1¦[]-a]¦a_b¦
+2¦2¦[+--]¦a,b¦
+2¦2¦[--/]¦a.b¦
+2¦2¦[^---]¦-ab¦
+3¦3¦[][.-.]-0]¦ab0-]¦
+3¦3¦[A-[.].]c]¦ab]!¦
+2¦6¦bc[d-w]xy¦abchxyz¦
+# GA129
+1¦1¦[a-cd-f]¦dbccde¦
+-1¦-1¦[a-ce-f]¦dBCCdE¦
+2¦4¦b[n-zA-M]Y¦absY9Z¦
+2¦4¦b[n-zA-M]Y¦abGY9Z¦
+# GA130
+3¦3¦[-xy]¦ac-¦
+2¦4¦c[-xy]D¦ac-D+¦
+2¦2¦[--/]¦a.b¦
+2¦4¦c[--/]D¦ac.D+b¦
+2¦2¦[^-ac]¦abcde-¦
+1¦3¦a[^-ac]c¦abcde-¦
+3¦3¦[xy-]¦zc-¦
+2¦4¦c[xy-]7¦zc-786¦
+2¦2¦[^ac-]¦abcde-¦
+2¦4¦a[^ac-]c¦5abcde-¦
+2¦2¦[+--]¦a,b¦
+2¦4¦a[+--]B¦Xa,By¦
+2¦2¦[^---]¦-ab¦
+4¦6¦X[^---]Y¦X-YXaYXbY¦
+# 2.8.3.3  BREs Matching Multiple Characters
+# GA131
+3¦4¦cd¦abcdeabcde¦
+1¦2¦ag*b¦abcde¦
+-1¦-1¦[a-c][e-f]¦abcdef¦
+3¦4¦[a-c][e-f]¦acbedf¦
+4¦8¦abc*XYZ¦890abXYZ#*¦
+4¦9¦abc*XYZ¦890abcXYZ#*¦
+4¦15¦abc*XYZ¦890abcccccccXYZ#*¦
+-1¦-1¦abc*XYZ¦890abc*XYZ#*¦
+# GA132
+2¦4¦\(*bc\)¦a*bc¦
+1¦2¦\(ab\)¦abcde¦
+1¦10¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)¦abcdefghijk¦
+3¦8¦43\(2\(6\)*0\)AB¦654320ABCD¦
+3¦9¦43\(2\(7\)*0\)AB¦6543270ABCD¦
+3¦12¦43\(2\(7\)*0\)AB¦6543277770ABCD¦
+# GA133
+1¦10¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)¦abcdefghijk¦
+-1¦-1¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(k\)\)\)\)\)\)\)\)¦abcdefghijk¦
+# GA134
+2¦4¦\(bb*\)¦abbbc¦
+2¦2¦\(bb*\)¦ababbbc¦
+1¦6¦a\(.*b\)¦ababbbc¦
+1¦2¦a\(b*\)¦ababbbc¦
+1¦20¦a\(.*b\)c¦axcaxbbbcsxbbbbbbbbc¦
+# GA135
+1¦7¦\(a\(b\(c\(d\(e\)\)\)\)\)\4¦abcdededede¦
+#W POSIX does not really specify whether a\(b\)*c\1 matches acb.
+#W back references are supposed to expand to the last match, but what
+#W if there never was a match as in this case?
+-1¦-1¦a\(b\)*c\1¦acb¦
+1¦11¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)\9¦abcdefghijjk¦
+# GA136
+#W These two tests have the same problem as the test in GA135.  No match
+#W of a subexpression, why should the back reference be usable?
+#W 1 2 a\(b\)*c\1 acb
+#W 4 7 a\(b\(c\(d\(f\)*\)\)\)\4¦xYzabcdePQRST
+-1¦-1¦a\(b\)*c\1¦acb¦
+-1¦-1¦a\(b\(c\(d\(f\)*\)\)\)\4¦xYzabcdePQRST¦
+# GA137
+-2¦-2¦\(a\(b\)\)\3¦foo¦
+-2¦-2¦\(a\(b\)\)\(a\(b\)\)\5¦foo¦
+# GA138
+1¦2¦ag*b¦abcde¦
+1¦10¦a.*b¦abababvbabc¦
+2¦5¦b*c¦abbbcdeabbbbbbcde¦
+2¦5¦bbb*c¦abbbcdeabbbbbbcde¦
+1¦5¦a\(b\)*c\1¦abbcbbb¦
+-1¦-1¦a\(b\)*c\1¦abbdbd¦
+0¦0¦\([a-c]*\)\1¦abcacdef¦
+1¦6¦\([a-c]*\)\1¦abcabcabcd¦
+1¦2¦a^*b¦ab¦
+1¦5¦a^*b¦a^^^b¦
+# GA139
+1¦2¦a\{2\}¦aaaa¦
+1¦7¦\([a-c]*\)\{0,\}¦aabcaab¦
+1¦2¦\(a\)\1\{1,2\}¦aabc¦
+1¦3¦\(a\)\1\{1,2\}¦aaaabc¦
+#W the expression \(\(a\)\1\)\{1,2\} is ill-formed, using \2
+1¦4¦\(\(a\)\2\)\{1,2\}¦aaaabc¦
+# GA140
+1¦2¦a\{2\}¦aaaa¦
+-1¦-1¦a\{2\}¦abcd¦
+0¦0¦a\{0\}¦aaaa¦
+1¦64¦a\{64\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦
+# GA141
+1¦7¦\([a-c]*\)\{0,\}¦aabcaab¦
+#W the expected result for \([a-c]*\)\{2,\} is failure which isn't correct
+1¦3¦\([a-c]*\)\{2,\}¦abcdefg¦
+1¦3¦\([a-c]*\)\{1,\}¦abcdefg¦
+-1¦-1¦a\{64,\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦
+# GA142
+1¦3¦a\{2,3\}¦aaaa¦
+-1¦-1¦a\{2,3\}¦abcd¦
+0¦0¦\([a-c]*\)\{0,0\}¦foo¦
+1¦63¦a\{1,63\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦
+# 2.8.3.4  BRE Precedence
+# GA143
+#W There are numerous bugs in the original version.
+2¦19¦\^\[[[.].]]\\(\\1\\)\*\\{1,2\\}\$¦a^[]\(\1\)*\{1,2\}$b¦
+1¦6¦[[=*=]][[=\=]][[=]=]][[===]][[...]][[:punct:]]¦*\]=.;¦
+1¦6¦[$\(*\)^]*¦$\()*^¦
+1¦1¦[\1]¦1¦
+1¦1¦[\{1,2\}]¦{¦
+#W the expected result for \(*\)*\1* is 2-2 which isn't correct
+0¦0¦\(*\)*\1*¦a*b*11¦
+2¦3¦\(*\)*\1*b¦a*b*11¦
+#W the expected result for \(a\(b\{1,2\}\)\{1,2\}\) is 1-5 which isn't correct
+1¦3¦\(a\(b\{1,2\}\)\{1,2\}\)¦abbab¦
+1¦5¦\(a\(b\{1,2\}\)\)\{1,2\}¦abbab¦
+1¦1¦^\(^\(^a$\)$\)$¦a¦
+1¦2¦\(a\)\1$¦aa¦
+1¦3¦ab*¦abb¦
+1¦4¦ab\{2,4\}¦abbbc¦
+# 2.8.3.5  BRE Expression Anchoring
+# GA144
+1¦1¦^a¦abc¦
+-1¦-1¦^b¦abc¦
+-1¦-1¦^[a-zA-Z]¦99Nine¦
+1¦4¦^[a-zA-Z]*¦Nine99¦
+# GA145(1)
+1¦2¦\(^a\)\1¦aabc¦
+-1¦-1¦\(^a\)\1¦^a^abc¦
+1¦2¦\(^^a\)¦^a¦
+1¦1¦\(^^\)¦^^¦
+1¦3¦\(^abc\)¦abcdef¦
+-1¦-1¦\(^def\)¦abcdef¦
+### GA145(2)			GNU regex implements GA145(1)
+##-1¦-1¦\(^a\)\1¦aabc¦
+##1¦4¦\(^a\)\1¦^a^abc¦
+##-1¦-1¦\(^^a\)¦^a¦
+##1¦2¦\(^^\)¦^^¦
+# GA146
+3¦3¦a$¦cba¦
+-1¦-1¦a$¦abc¦
+5¦7¦[a-z]*$¦99ZZxyz¦
+#W the expected result for [a-z]*$ is failure which isn't correct
+10¦9¦[a-z]*$¦99ZZxyz99¦
+3¦3¦$$¦ab$¦
+-1¦-1¦$$¦$ab¦
+3¦3¦\$$¦ab$¦
+# GA147(1)
+-1¦-1¦\(a$\)\1¦bcaa¦
+-1¦-1¦\(a$\)\1¦ba$¦
+-1¦-1¦\(ab$\)¦ab$¦
+1¦2¦\(ab$\)¦ab¦
+4¦6¦\(def$\)¦abcdef¦
+-1¦-1¦\(abc$\)¦abcdef¦
+### GA147(2)			GNU regex implements GA147(1)
+##-1¦-1¦\(a$\)\1¦bcaa¦
+##2¦5¦\(a$\)\1¦ba$a$¦
+##-1¦-1¦\(ab$\)¦ab¦
+##1¦3¦\(ab$\)¦ab$¦
+# GA148
+0¦0¦^$¦¦
+1¦3¦^abc$¦abc¦
+-1¦-1¦^xyz$¦^xyz^¦
+-1¦-1¦^234$¦^234$¦
+1¦9¦^[a-zA-Z0-9]*$¦2aA3bB9zZ¦
+-1¦-1¦^[a-z0-9]*$¦2aA3b#B9zZ¦
diff --git a/test/src/regex/regex-resources/TESTS b/test/src/regex/regex-resources/TESTS
new file mode 100644
index 0000000..f2c9886
--- /dev/null
+++ b/test/src/regex/regex-resources/TESTS
@@ -0,0 +1,167 @@
+0:(.*)*\1:xx
+0:^:
+0:$:
+0:^$:
+0:^a$:a
+0:abc:abc
+1:abc:xbc
+1:abc:axc
+1:abc:abx
+0:abc:xabcy
+0:abc:ababc
+0:ab*c:abc
+0:ab*bc:abc
+0:ab*bc:abbc
+0:ab*bc:abbbbc
+0:ab+bc:abbc
+1:ab+bc:abc
+1:ab+bc:abq
+0:ab+bc:abbbbc
+0:ab?bc:abbc
+0:ab?bc:abc
+1:ab?bc:abbbbc
+0:ab?c:abc
+0:^abc$:abc
+1:^abc$:abcc
+0:^abc:abcc
+1:^abc$:aabc
+0:abc$:aabc
+0:^:abc
+0:$:abc
+0:a.c:abc
+0:a.c:axc
+0:a.*c:axyzc
+1:a.*c:axyzd
+1:a[bc]d:abc
+0:a[bc]d:abd
+1:a[b-d]e:abd
+0:a[b-d]e:ace
+0:a[b-d]:aac
+0:a[-b]:a-
+0:a[b-]:a-
+2:a[b-a]:-
+2:a[]b:-
+2:a[:-
+0:a]:a]
+0:a[]]b:a]b
+0:a[^bc]d:aed
+1:a[^bc]d:abd
+0:a[^-b]c:adc
+1:a[^-b]c:a-c
+1:a[^]b]c:a]c
+0:a[^]b]c:adc
+0:ab|cd:abc
+0:ab|cd:abcd
+0:()ef:def
+0:()*:-
+2:*a:-
+2:^*:-
+2:$*:-
+2:(*)b:-
+1:$b:b
+2:a\:-
+0:a\(b:a(b
+0:a\(*b:ab
+0:a\(*b:a((b
+1:a\x:a\x
+1:abc):-
+2:(abc:-
+0:((a)):abc
+0:(a)b(c):abc
+0:a+b+c:aabbabc
+0:a**:-
+0:a*?:-
+0:(a*)*:-
+0:(a*)+:-
+0:(a|)*:-
+0:(a*|b)*:-
+0:(a+|b)*:ab
+0:(a+|b)+:ab
+0:(a+|b)?:ab
+0:[^ab]*:cde
+0:(^)*:-
+0:(ab|)*:-
+2:)(:-
+1:abc:
+1:abc:
+0:a*:
+0:([abc])*d:abbbcd
+0:([abc])*bcd:abcd
+0:a|b|c|d|e:e
+0:(a|b|c|d|e)f:ef
+0:((a*|b))*:-
+0:abcd*efg:abcdefg
+0:ab*:xabyabbbz
+0:ab*:xayabbbz
+0:(ab|cd)e:abcde
+0:[abhgefdc]ij:hij
+1:^(ab|cd)e:abcde
+0:(abc|)ef:abcdef
+0:(a|b)c*d:abcd
+0:(ab|ab*)bc:abc
+0:a([bc]*)c*:abc
+0:a([bc]*)(c*d):abcd
+0:a([bc]+)(c*d):abcd
+0:a([bc]*)(c+d):abcd
+0:a[bcd]*dcdcde:adcdcde
+1:a[bcd]+dcdcde:adcdcde
+0:(ab|a)b*c:abc
+0:((a)(b)c)(d):abcd
+0:[A-Za-z_][A-Za-z0-9_]*:alpha
+0:^a(bc+|b[eh])g|.h$:abh
+0:(bc+d$|ef*g.|h?i(j|k)):effgz
+0:(bc+d$|ef*g.|h?i(j|k)):ij
+1:(bc+d$|ef*g.|h?i(j|k)):effg
+1:(bc+d$|ef*g.|h?i(j|k)):bcdd
+0:(bc+d$|ef*g.|h?i(j|k)):reffgz
+1:((((((((((a)))))))))):-
+0:(((((((((a))))))))):a
+1:multiple words of text:uh-uh
+0:multiple words:multiple words, yeah
+0:(.*)c(.*):abcde
+1:\((.*),:(.*)\)
+1:[k]:ab
+0:abcd:abcd
+0:a(bc)d:abcd
+0:a[\x01-\x03]?c:a\x02c
+0:(....).*\1:beriberi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Qaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mo'ammar Gadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Kaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Qadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar El Kadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Gadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar al-Qadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamer El Kazzafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar al-Gaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar Al Qathafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Al Qathafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mo'ammar el-Gadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar El Kadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar al-Qadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar al-Qadhdhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar Qadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar Gaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar Qadhdhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Khaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar al-Khaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'amar al-Kadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Ghaddafy
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Ghadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Ghaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muamar Kaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Quathafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Gheddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muamar Al-Kaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar Khadafy 
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar Qudhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar al-Qaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mulazim Awwal Mu'ammar Muhammad Abu Minyar al-Qadhafi
+0:[[:digit:]]+:01234
+1:[[:alpha:]]+:01234
+0:^[[:digit:]]*$:01234
+1:^[[:digit:]]*$:01234a
+0:^[[:alnum:]]*$:01234a
+0:^[[:xdigit:]]*$:01234a
+1:^[[:xdigit:]]*$:01234g
+0:^[[:alnum:][:space:]]*$:Hello world
-- 
2.1.4


[-- Attachment #3: 0002-Added-driver-for-the-regex-tests.patch --]
[-- Type: text/x-diff, Size: 22493 bytes --]

From c5687df16cd4cb73fc593556a68a124354f273db Mon Sep 17 00:00:00 2001
From: Dima Kogan <dima@secretsauce.net>
Date: Sat, 27 Feb 2016 18:06:35 -0800
Subject: [PATCH 2/2] Added driver for the regex tests

* test/src/regex/regex-tests.el: regex test driver
---
 test/src/regex/regex-tests.el | 590 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 590 insertions(+)
 create mode 100644 test/src/regex/regex-tests.el

diff --git a/test/src/regex/regex-tests.el b/test/src/regex/regex-tests.el
new file mode 100644
index 0000000..8709f90
--- /dev/null
+++ b/test/src/regex/regex-tests.el
@@ -0,0 +1,590 @@
+;;; regex-tests.el --- tests for regex.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2016 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.
+
+(require 'ert)
+
+(defmacro regex-tests-generic-line (comment-char test-file whitelist &rest body)
+  "Reads a line of the test file TEST-FILE, skipping
+comments (defined by COMMENT-CHAR), and evaluates the tests in
+this line as defined in the BODY.  Line numbers in the WHITELIST
+are known failures, and are skipped."
+
+  `(with-temp-buffer
+    (modify-syntax-entry ?_ "w;; ") ; tests expect _ to be a word
+    (insert-file-contents ,(concat (file-name-directory (buffer-file-name)) test-file))
+
+    (let ((case-fold-search nil)
+          (line-number 1)
+          (whitelist-idx 0))
+
+      (goto-char (point-min))
+
+      (while (not (eobp))
+        (let ((start (point)))
+          (end-of-line)
+          (narrow-to-region start (point))
+
+          (goto-char (point-min))
+
+          (when
+              (and
+               ;; ignore comments
+               (save-excursion
+                 (re-search-forward ,(concat "^[^" (string comment-char) "]") nil t))
+
+               ;; skip lines in the whitelist
+               (let ((whitelist-next
+                      (condition-case nil
+                          (aref ,whitelist whitelist-idx) (args-out-of-range nil))))
+                 (cond
+                  ;; whitelist exhausted. do process this line
+                  ((null whitelist-next) t)
+
+                  ;; we're not yet at the next whitelist element. do
+                  ;; process this line
+                  ((< line-number whitelist-next) t)
+
+                  ;; we're past the next whitelist element. This
+                  ;; shouldn't happen
+                  ((> line-number whitelist-next)
+                   (error
+                    (format
+                     "We somehow skipped the next whitelist element: line %d" whitelist-next)))
+
+                  ;; we're at the next whitelist element. Skip this
+                  ;; line, and advance the whitelist index
+                  (t
+                   (setq whitelist-idx (1+ whitelist-idx)) nil))))
+            ,@body)
+
+          (widen)
+          (forward-line)
+          (beginning-of-line)
+          (setq line-number (1+ line-number)))))))
+
+(defun regex-tests-compare (string what-failed bounds-ref &optional substring-ref)
+  "I just ran a search, looking at STRING.  WHAT-FAILED describes
+what failed, if anything; valid values are 'search-failed,
+'compilation-failed and nil.  I compare the beginning/end of each
+group with their expected values.  This is done with either
+BOUNDS-REF or SUBSTRING-REF; one of those should be non-nil.
+BOUNDS-REF is a sequence \[start-ref0 end-ref0 start-ref1
+end-ref1 ....] while SUBSTRING-REF is the expected substring
+obtained by indexing the input string by start/end-ref.
+
+If the search was supposed to fail then start-ref0/substring-ref0
+is 'search-failed.  If the search wasn't even supposed to compile
+successfully, then start-ref0/substring-ref0 is
+'compilation-failed.  If I only care about a match succeeding,
+this can be set to t.
+
+This function returns a string that describes the failure, or nil
+on success"
+
+  (when (or
+         (and bounds-ref substring-ref)
+         (not (or bounds-ref substring-ref)))
+    (error "Exactly one of bounds-ref and bounds-ref should be non-nil"))
+
+  (let ((what-failed-ref (car (or bounds-ref substring-ref))))
+
+    (cond
+     ((eq what-failed 'search-failed)
+      (cond
+       ((eq what-failed-ref 'search-failed)
+        nil)
+       ((eq what-failed-ref 'compilation-failed)
+        "Expected pattern failure; but no match")
+       (t
+        "Expected match; but no match")))
+
+     ((eq what-failed 'compilation-failed)
+      (cond
+       ((eq what-failed-ref 'search-failed)
+        "Expected no match; but pattern failure")
+       ((eq what-failed-ref 'compilation-failed)
+        nil)
+       (t
+        "Expected match; but pattern failure")))
+
+     ;; The regex match succeeded
+     ((eq what-failed-ref 'search-failed)
+      "Expected no match; but match")
+     ((eq what-failed-ref 'compilation-failed)
+      "Expected pattern failure; but match")
+
+     ;; The regex match succeeded, as expected. I now check all the
+     ;; bounds
+     (t
+      (let ((idx 0)
+            msg
+            ref next-ref-function compare-ref-function mismatched-ref-function)
+
+        (if bounds-ref
+            (setq ref bounds-ref
+                  next-ref-function (lambda (x) (cddr x))
+                  compare-ref-function (lambda (ref start-pos end-pos)
+                                         (or (eq (car ref) t)
+                                             (and (eq start-pos (car ref))
+                                                  (eq end-pos   (cadr ref)))))
+                  mismatched-ref-function (lambda (ref start-pos end-pos)
+                                            (format
+                                             "beginning/end positions: %d/%s and %d/%s"
+                                             start-pos (car ref) end-pos (cadr ref))))
+          (setq ref substring-ref
+                next-ref-function (lambda (x) (cdr x))
+                compare-ref-function (lambda (ref start-pos end-pos)
+                                       (or (eq (car ref) t)
+                                           (string= (substring string start-pos end-pos) (car ref))))
+                mismatched-ref-function (lambda (ref start-pos end-pos)
+                                          (format
+                                           "beginning/end positions: %d/%s and %d/%s"
+                                           start-pos (car ref) end-pos (cadr ref)))))
+
+        (while (not (or (null ref) msg))
+
+          (let ((start (match-beginning idx))
+                (end   (match-end       idx)))
+
+            (when (not (funcall compare-ref-function ref start end))
+              (setq msg
+                    (format
+                     "Have expected match, but mismatch in group %d: %s" idx (funcall mismatched-ref-function ref start end))))
+
+            (setq ref (funcall next-ref-function ref)
+                  idx (1+ idx))))
+
+        (or msg
+            nil))))))
+
+
+
+(defun regex-tests-match (pattern string bounds-ref &optional substring-ref)
+  "I match the given STRING against PATTERN.  I compare the
+beginning/end of each group with their expected values.
+BOUNDS-REF is a sequence [start-ref0 end-ref0 start-ref1 end-ref1
+....].
+
+If the search was supposed to fail then start-ref0 is
+'search-failed.  If the search wasn't even supposed to compile
+successfully, then start-ref0 is 'compilation-failed.
+
+This function returns a string that describes the failure, or nil
+on success"
+
+  (if (string-match "\\[\\([\\.=]\\)..?\\1\\]" pattern)
+      ;; Skipping test: [.x.] and [=x=] forms not supported by emacs
+      nil
+
+    (regex-tests-compare
+     string
+     (condition-case nil
+         (if (string-match pattern string) nil 'search-failed)
+       ('invalid-regexp 'compilation-failed))
+     bounds-ref substring-ref)))
+
+
+(defconst regex-tests-re-even-escapes
+  "\\(?:^\\|[^\\\\]\\)\\(?:\\\\\\\\\\)*"
+  "Regex that matches an even number of \\ characters")
+
+(defconst regex-tests-re-odd-escapes
+  (concat regex-tests-re-even-escapes "\\\\")
+  "Regex that matches an odd number of \\ characters")
+
+
+(defun regex-tests-unextend (pattern)
+  "Basic conversion from extended regexen to emacs ones.  This is
+mostly a hack that adds \\ to () and | and {}, and removes it if
+it already exists.  We also change \\S (and \\s) to \\S- (and
+\\s-) because extended regexen see the former as whitespace, but
+emacs requires an extra symbol character"
+
+  (with-temp-buffer
+    (insert pattern)
+    (goto-char (point-min))
+
+    (while (re-search-forward "[()|{}]" nil t)
+      ;; point is past special character. If it is escaped, unescape
+      ;; it
+
+      (if (save-excursion
+            (re-search-backward (concat regex-tests-re-odd-escapes ".\\=") nil t))
+
+          ;; This special character is preceded by an odd number of \,
+          ;; so I unescape it by removing the last one
+          (progn
+            (forward-char -2)
+            (delete-char 1)
+            (forward-char 1))
+
+        ;; This special character is preceded by an even (possibly 0)
+        ;; number of \. I add an escape
+        (forward-char -1)
+        (insert "\\")
+        (forward-char 1)))
+
+    ;; convert \s to \s-
+    (goto-char (point-min))
+    (while (re-search-forward (concat regex-tests-re-odd-escapes "[Ss]") nil t)
+      (insert "-"))
+
+    (buffer-string)))
+
+(defun regex-tests-BOOST-frob-escapes (s ispattern)
+  "Mangle \\ the way it is done in frob_escapes() in
+regex-tests-BOOST.c in glibc: \\t, \\n, \\r are interpreted;
+\\\\, \\^, \{, \\|, \} are unescaped for the string (not
+pattern)"
+
+  ;; this is all similar to (regex-tests-unextend)
+  (with-temp-buffer
+    (insert s)
+
+    (let ((interpret-list (list "t" "n" "r")))
+      (while interpret-list
+        (goto-char (point-min))
+        (while (re-search-forward
+                (concat "\\(" regex-tests-re-even-escapes "\\)"
+                        "\\\\" (car interpret-list))
+                nil t)
+          (replace-match (concat "\\1" (car (read-from-string
+                                             (concat "\"\\" (car interpret-list) "\""))))))
+
+        (setq interpret-list (cdr interpret-list))))
+
+    (when (not ispattern)
+      ;; unescape \\, \^, \{, \|, \}
+      (let ((unescape-list (list "\\\\" "^" "{" "|" "}")))
+        (while unescape-list
+          (goto-char (point-min))
+          (while (re-search-forward
+                  (concat "\\(" regex-tests-re-even-escapes "\\)"
+                          "\\\\" (car unescape-list))
+                  nil t)
+            (replace-match (concat "\\1" (car unescape-list))))
+
+          (setq unescape-list (cdr unescape-list))))
+      )
+    (buffer-string)))
+
+
+
+
+(defconst regex-tests-BOOST-whitelist
+  [
+   ;; emacs is more stringent with regexen involving unbalanced )
+   63 65 69
+
+   ;; in emacs, regex . doesn't match \n
+   91
+
+   ;; emacs is more forgiving with * and ? that don't apply to
+   ;; characters
+   107 108 109 122 123 124 140 141 142
+
+   ;; emacs accepts regexen with {}
+   161
+
+   ;; emacs doesn't fail on bogus ranges such as [3-1] or [1-3-5]
+   222 223
+
+   ;; emacs doesn't match (ab*)[ab]*\1 greedily: only 4 chars of
+   ;; ababaaa match
+   284 294
+
+   ;; ambiguous groupings are ambiguous
+   443 444 445 446 448 449 450
+
+   ;; emacs doesn't know how to handle weird ranges such as [a-Z] and
+   ;; [[:alpha:]-a]
+   539 580 581
+
+   ;; emacs matches non-greedy regex ab.*? non-greedily
+   639 677 712
+   ]
+  "Line numbers in the boost test that should be skipped.  These
+are false-positive test failures that represent known/benign
+differences in behavior.")
+
+;; - Format
+;;   - Comments are lines starting with ;
+;;   - Lines starting with - set options passed to regcomp() and regexec():
+;;     - if no "REG_BASIC" is found, with have an extended regex
+;;     - These set a flag:
+;;       - REG_ICASE
+;;       - REG_NEWLINE
+;;       - REG_NOTBOL
+;;       - REG_NOTEOL
+;;
+;;   - Test lines are
+;;     pattern string start0 end0 start1 end1 ...
+;;
+;;   - pattern, string can have escapes
+;;   - string can have whitespace if enclosed in ""
+;;   - if string is "!", then the pattern is supposed to fail compilation
+;;   - start/end are of group0, group1, etc. group 0 is the full match
+;;   - start<0 indicates "no match"
+;;   - start is the 0-based index of the first character
+;;   - end   is the 0-based index of the first character past the group
+(defun regex-tests-BOOST ()
+  (let (failures
+        basic icase newline notbol noteol)
+    (regex-tests-generic-line
+     ?; "regex-resources/BOOST.tests" regex-tests-BOOST-whitelist
+     (if (save-excursion (re-search-forward "^-" nil t))
+         (setq basic   (save-excursion (re-search-forward "REG_BASIC" nil t))
+               icase   (save-excursion (re-search-forward "REG_ICASE" nil t))
+               newline (save-excursion (re-search-forward "REG_NEWLINE" nil t))
+               notbol  (save-excursion (re-search-forward "REG_NOTBOL" nil t))
+               noteol  (save-excursion (re-search-forward "REG_NOTEOL" nil t)))
+
+       (save-excursion
+         (or (re-search-forward "\\(\\S-+\\)\\s-+\"\\(.*\\)\"\\s-+?\\(.+\\)" nil t)
+             (re-search-forward "\\(\\S-+\\)\\s-+\\(\\S-+\\)\\s-+?\\(.+\\)"  nil t)
+             (re-search-forward "\\(\\S-+\\)\\s-+\\(!\\)"                    nil t)))
+
+       (let* ((pattern-raw   (match-string 1))
+              (string-raw    (match-string 2))
+              (positions-raw (match-string 3))
+              (pattern (regex-tests-BOOST-frob-escapes pattern-raw t))
+              (string  (regex-tests-BOOST-frob-escapes string-raw  nil))
+              (positions
+               (if (string= string "!")
+                   (list 'compilation-failed 0)
+                 (mapcar
+                  (lambda (x)
+                    (let ((x (string-to-number x)))
+                      (if (< x 0) nil x)))
+                  (split-string positions-raw)))))
+
+         (when (null (car positions))
+           (setcar positions 'search-failed))
+
+         (when (not basic)
+           (setq pattern (regex-tests-unextend pattern)))
+
+         ;; great. I now have all the data parsed. Let's use it to do
+         ;; stuff
+         (let* ((case-fold-search icase)
+                (msg (regex-tests-match pattern string positions)))
+
+           (if (and
+                ;; Skipping test: notbol/noteol not supported
+                (not notbol) (not noteol)
+
+                msg)
+
+               ;; store failure
+               (setq failures
+                     (cons (format "line number %d: Regex '%s': %s"
+                                   line-number pattern msg)
+                           failures)))))))
+
+    failures))
+
+(defconst regex-tests-PCRE-whitelist
+  [
+   ;; ambiguous groupings are ambiguous
+   610 611 1154 1157 1160 1168 1171 1176 1179 1182 1185 1188 1193 1196 1203
+  ]
+  "Line numbers in the PCRE test that should be skipped.  These
+are false-positive test failures that represent known/benign
+differences in behavior.")
+
+;; - Format
+;;
+;;  regex
+;;  input_string
+;;  group_num: group_match | "No match"
+;;  input_string
+;;  group_num: group_match | "No match"
+;;  input_string
+;;  group_num: group_match | "No match"
+;;  input_string
+;;  group_num: group_match | "No match"
+;;  ...
+(defun regex-tests-PCRE ()
+  (let (failures
+        pattern icase string what-failed matches-observed)
+    (regex-tests-generic-line
+     ?# "regex-resources/PCRE.tests" regex-tests-PCRE-whitelist
+
+     (cond
+
+      ;; pattern
+      ((save-excursion (re-search-forward "^/\\(.*\\)/\\(.*i?\\)$" nil t))
+       (setq icase (string= "i" (match-string 2))
+             pattern (regex-tests-unextend (match-string 1))))
+
+      ;; string. read it in, match against pattern, and save all the results
+      ((save-excursion (re-search-forward "^    \\(.*\\)" nil t))
+       (let ((case-fold-search icase))
+         (setq string (match-string 1)
+
+               ;; the regex match under test
+               what-failed
+               (condition-case nil
+                   (if (string-match pattern string) nil 'search-failed)
+                 ('invalid-regexp 'compilation-failed))
+
+               matches-observed
+               (loop for x from 0 to 20
+                     collect (and (not what-failed)
+                                  (or (match-string x string) "<unset>")))))
+       nil)
+
+      ;; verification line: failed match
+      ((save-excursion (re-search-forward "^No match" nil t))
+       (unless what-failed
+         (setq failures
+               (cons (format "line number %d: Regex '%s': Expected no match; but match"
+                             line-number pattern)
+                     failures))))
+
+      ;; verification line: succeeded match
+      ((save-excursion (re-search-forward "^ *\\([0-9]+\\): \\(.*\\)" nil t))
+       (let* ((match-ref (match-string 2))
+              (idx       (string-to-number (match-string 1))))
+
+         (if what-failed
+             "Expected match; but no match"
+           (unless (string= match-ref (elt matches-observed idx))
+             (setq failures
+                   (cons (format "line number %d: Regex '%s': Have expected match, but group %d is wrong: '%s'/'%s'"
+                                 line-number pattern
+                                 idx match-ref (elt matches-observed idx))
+                         failures))))))
+
+      ;; reset
+      (t (setq pattern nil) nil)))
+
+    failures))
+
+(defconst regex-tests-PTESTS-whitelist
+  [
+   ;; emacs doesn't barf on weird ranges such as [b-a], but simply
+   ;; fails to match
+   138
+
+   ;; emacs doesn't see DEL (0x78) as a [:cntrl:] character
+   168
+  ]
+  "Line numbers in the PTESTS test that should be skipped.  These
+are false-positive test failures that represent known/benign
+differences in behavior.")
+
+;; - Format
+;;   - fields separated by ¦ (note: this is not a |)
+;;   - start¦end¦pattern¦string
+;;   - start is the 1-based index of the first character
+;;   - end   is the 1-based index of the last  character
+(defun regex-tests-PTESTS ()
+  (let (failures)
+    (regex-tests-generic-line
+     ?# "regex-resources/PTESTS" regex-tests-PTESTS-whitelist
+     (let* ((fields (split-string (buffer-string) "¦"))
+
+            ;; string has 1-based index of first char in the
+            ;; match. -1 means "no match". -2 means "invalid
+            ;; regex".
+            ;;
+            ;; start-ref is 0-based index of first char in the
+            ;; match
+            ;;
+            ;; string==0 is a special case, and I have to treat
+            ;; it as start-ref = 0
+            (start-ref (let ((raw (string-to-number (elt fields 0))))
+                         (cond
+                          ((= raw -2) 'compilation-failed)
+                          ((= raw -1) 'search-failed)
+                          ((= raw  0) 0)
+                          (t          (1- raw)))))
+
+            ;; string has 1-based index of last char in the
+            ;; match. end-ref is 0-based index of first char past
+            ;; the match
+            (end-ref   (string-to-number (elt fields 1)))
+            (pattern   (elt fields 2))
+            (string    (elt fields 3)))
+
+       (let ((msg (regex-tests-match pattern string (list start-ref end-ref))))
+         (when msg
+           (setq failures
+                 (cons (format "line number %d: Regex '%s': %s"
+                               line-number pattern msg)
+                       failures))))))
+    failures))
+
+(defconst regex-tests-TESTS-whitelist
+  [
+   ;; emacs doesn't barf on weird ranges such as [b-a], but simply
+   ;; fails to match
+   42
+
+   ;; emacs is more forgiving with * and ? that don't apply to
+   ;; characters
+   57 58 59 60
+
+   ;; emacs is more stringent with regexen involving unbalanced )
+   67
+  ]
+  "Line numbers in the TESTS test that should be skipped.  These
+are false-positive test failures that represent known/benign
+differences in behavior.")
+
+;; - Format
+;;   - fields separated by :. Watch for [\[:xxx:]]
+;;   - expected:pattern:string
+;;
+;;   expected:
+;;   | 0 | successful match      |
+;;   | 1 | failed match          |
+;;   | 2 | regcomp() should fail |
+(defun regex-tests-TESTS ()
+  (let (failures)
+    (regex-tests-generic-line
+     ?# "regex-resources/TESTS" regex-tests-TESTS-whitelist
+     (if (save-excursion (re-search-forward "^\\([^:]+\\):\\(.*\\):\\([^:]*\\)$" nil t))
+         (let* ((what-failed
+                 (let ((raw (string-to-number (match-string 1))))
+                   (cond
+                    ((= raw 2) 'compilation-failed)
+                    ((= raw 1) 'search-failed)
+                    (t         t))))
+                (string  (match-string 3))
+                (pattern (regex-tests-unextend (match-string 2))))
+
+           (let ((msg (regex-tests-match pattern string nil (list what-failed))))
+             (when msg
+               (setq failures
+                     (cons (format "line number %d: Regex '%s': %s"
+                                   line-number pattern msg)
+                           failures)))))
+
+       (error "Error parsing TESTS file line: '%s'" (buffer-string))))
+    failures))
+
+(ert-deftest regex-tests ()
+  "Tests of the email regular expression engine.  This evaluates
+the BOOST, PCRE, PTESTS and TESTS test cases from glibc."
+  (should-not (regex-tests-BOOST))
+  (should-not (regex-tests-PCRE))
+  (should-not (regex-tests-PTESTS))
+  (should-not (regex-tests-TESTS)))
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-28  0:23                   ` Dima Kogan
@ 2016-03-28 15:28                     ` Eli Zaretskii
  2016-03-28 18:05                       ` Dima Kogan
  2016-03-28 16:09                     ` Stefan Monnier
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2016-03-28 15:28 UTC (permalink / raw)
  To: Dima Kogan; +Cc: emacs-devel

> From: Dima Kogan <dima@secretsauce.net>
> Date: Sun, 27 Mar 2016 17:23:28 -0700
> 
> Sorry for the delay. Attached are two patches to import most of glibc's
> regex tests to emacs. Assuming these are acceptable to be merged into
> our tree, what specifically are people thinking in terms of updates to
> the regex engine? Is there a particular implementation that we were
> considering?

What's wrong with the existing one?

> --- /dev/null
> +++ b/test/src/regex/regex-resources/BOOST.tests
> @@ -0,0 +1,829 @@
> +; 
> +; 
> +; this file contains a script of tests to run through regress.exe

What is "regress.exe"?

Also, please provide log messages for the changes.

Thanks.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-28  0:23                   ` Dima Kogan
  2016-03-28 15:28                     ` Eli Zaretskii
@ 2016-03-28 16:09                     ` Stefan Monnier
  1 sibling, 0 replies; 22+ messages in thread
From: Stefan Monnier @ 2016-03-28 16:09 UTC (permalink / raw)
  To: emacs-devel

> regex tests to Emacs.  Assuming these are acceptable to be merged into
> our tree, what specifically are people thinking in terms of updates to
> the regex engine?

Number 1 in my book: get rid of the worst case that we encounter with
regexps like ".*.*a".


        Stefan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-28 15:28                     ` Eli Zaretskii
@ 2016-03-28 18:05                       ` Dima Kogan
  2016-03-28 18:24                         ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Dima Kogan @ 2016-03-28 18:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Dima Kogan <dima@secretsauce.net>
>> Date: Sun, 27 Mar 2016 17:23:28 -0700
>> 
>> Sorry for the delay. Attached are two patches to import most of glibc's
>> regex tests to emacs. Assuming these are acceptable to be merged into
>> our tree, what specifically are people thinking in terms of updates to
>> the regex engine? Is there a particular implementation that we were
>> considering?
>
> What's wrong with the existing one?

Maybe nothing. I'd like to add a feature, but if there are plans to
abandon the current implementation, then I want to add my feature to
whatever the new implementation is. I've seen earlier posts on the list
discussing such changes, and it wasn't clear if anybody had specific
plans, or if there was consensus.


>> --- /dev/null
>> +++ b/test/src/regex/regex-resources/BOOST.tests
>> @@ -0,0 +1,829 @@
>> +; 
>> +; 
>> +; this file contains a script of tests to run through regress.exe
>
> What is "regress.exe"?

These test definitions came verbatim from the glibc sources. I guess the
BOOST.tests comment mentions regress.exe. glibc doesn't have it, and
neither do we.


> Also, please provide log messages for the changes.

What kind of log messages? Git logs? These are terse because this
project seems to have strict guidelines, and other messages I see are
generally very terse. What would you like?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-28 18:05                       ` Dima Kogan
@ 2016-03-28 18:24                         ` Eli Zaretskii
  2016-03-29 21:23                           ` Dima Kogan
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2016-03-28 18:24 UTC (permalink / raw)
  To: Dima Kogan; +Cc: emacs-devel

> From: Dima Kogan <lists@dima.secretsauce.net>
> Cc: emacs-devel@gnu.org
> Date: Mon, 28 Mar 2016 11:05:02 -0700
> 
> > What's wrong with the existing one?
> 
> Maybe nothing. I'd like to add a feature, but if there are plans to
> abandon the current implementation, then I want to add my feature to
> whatever the new implementation is. I've seen earlier posts on the list
> discussing such changes, and it wasn't clear if anybody had specific
> plans, or if there was consensus.

AFAIK, no one works on replacing the current implementation.  So we
should talk about the changes first, and worry about replacing the
current code later.

> > Also, please provide log messages for the changes.
> 
> What kind of log messages? Git logs? These are terse because this
> project seems to have strict guidelines, and other messages I see are
> generally very terse. What would you like?

See CONTRIBUTE: they should be Git log messages in the ChangeLog
format.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-28 18:24                         ` Eli Zaretskii
@ 2016-03-29 21:23                           ` Dima Kogan
  2016-03-31 16:28                             ` Eli Zaretskii
  2016-04-23  4:17                             ` Dima Kogan
  0 siblings, 2 replies; 22+ messages in thread
From: Dima Kogan @ 2016-03-29 21:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> > What's wrong with the existing one?
>> 
>> Maybe nothing. I'd like to add a feature, but if there are plans to
>> abandon the current implementation, then I want to add my feature to
>> whatever the new implementation is.
>
> AFAIK, no one works on replacing the current implementation.  So we
> should talk about the changes first, and worry about replacing the
> current code later.

OK. I'll go back to look at my earlier work to add the feature to the
current implementation.


>> > Also, please provide log messages for the changes.
>> 
>> What kind of log messages? Git logs? These are terse because this
>> project seems to have strict guidelines, and other messages I see are
>> generally very terse. What would you like?
>
> See CONTRIBUTE: they should be Git log messages in the ChangeLog
> format.

Can you please be more specific about what is missing in the patches I
sent?

Thanks
dima



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-29 21:23                           ` Dima Kogan
@ 2016-03-31 16:28                             ` Eli Zaretskii
  2016-04-23  4:17                             ` Dima Kogan
  1 sibling, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2016-03-31 16:28 UTC (permalink / raw)
  To: Dima Kogan; +Cc: emacs-devel

> From: Dima Kogan <lists@dima.secretsauce.net>
> Cc: emacs-devel@gnu.org
> Date: Tue, 29 Mar 2016 14:23:11 -0700
> 
> >> > Also, please provide log messages for the changes.
> >> 
> >> What kind of log messages? Git logs? These are terse because this
> >> project seems to have strict guidelines, and other messages I see are
> >> generally very terse. What would you like?
> >
> > See CONTRIBUTE: they should be Git log messages in the ChangeLog
> > format.
> 
> Can you please be more specific about what is missing in the patches I
> sent?

A commit log message, formatted similarly to this one:

    Adapt filenotify-tests.el according latest tests

    * test/lisp/filenotify-tests.el (file-notify-test02-events)
    (file-notify-test04-file-validity, file-notify-test05-dir-validity):
    Remove superfluous `read-event' calls.
    (file-notify-test02-events): Expect different events under MS
    Windows 7 and 10.
    (file-notify-test04-file-validity): Move `file-notify-valid-p'
    check up.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-03-29 21:23                           ` Dima Kogan
  2016-03-31 16:28                             ` Eli Zaretskii
@ 2016-04-23  4:17                             ` Dima Kogan
  2016-04-23  7:48                               ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Dima Kogan @ 2016-04-23  4:17 UTC (permalink / raw)
  To: Dima Kogan; +Cc: Eli Zaretskii, emacs-devel

Dima Kogan <lists@dima.secretsauce.net> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> > What's wrong with the existing one?
>>> 
>>> Maybe nothing. I'd like to add a feature, but if there are plans to
>>> abandon the current implementation, then I want to add my feature to
>>> whatever the new implementation is.
>>
>> AFAIK, no one works on replacing the current implementation.  So we
>> should talk about the changes first, and worry about replacing the
>> current code later.
>
> OK. I'll go back to look at my earlier work to add the feature to the
> current implementation.

Sorry for the delay. An initial implementation of the case-fold embedded
modifiers lives at

  https://github.com/dkogan/emacs-snapshot/tree/regex_embedded_modifiers

That tree contains the implementation and the tests. This is not
intended to be a final implementation, but should be sufficient to get a
comment from the list. If this looks like something we're not going to
want to merge, I'd like to know before I put more work into it. Most
things should work as one would expect. Some fancier regexen probably do
not work yet; more tests should be written to test more cases. Maybe the
tests from the perl project should be imported into the suite in
addition to the glibc ones.

Clearly, this is not a small patch, but the test suite should hopefully
serve as some assurance that this doesn't break things (too badly).

The new code adds two patterns the regex engine understands:

   \(i\)   to turn on case-fold
   \(-i\)  to turn off case-fold

These are active until another such pattern is encountered, or until the
end of a () group. This is exactly how these work in perl. Before any of
these is encountered, the value of `case-fold-search' is used, so the
previous behavior should be preserved.

In the code, most functions previously accepted a `translate' argument
that was NULL to indicate that no case-fold is desired. This argument is
non-NULL in the new code, with an additional case-fold arg to indicate
the initial state.

The larger goal here, in my mind, is to add modifiers for all the
various switches that we have in isearch: lax-whitespace, char-fold,
symbol-search, word-search, etc. I can imagine this being useful for
various things, in particular making search histories work nicer, making
hi-lock simpler and so on.

Thanks



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-04-23  4:17                             ` Dima Kogan
@ 2016-04-23  7:48                               ` Eli Zaretskii
  2016-04-24  7:34                                 ` Dima Kogan
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2016-04-23  7:48 UTC (permalink / raw)
  To: Dima Kogan; +Cc: emacs-devel

> From: Dima Kogan <lists@dima.secretsauce.net>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
> Date: Fri, 22 Apr 2016 22:17:08 -0600
> 
> Sorry for the delay. An initial implementation of the case-fold embedded
> modifiers lives at
> 
>   https://github.com/dkogan/emacs-snapshot/tree/regex_embedded_modifiers

I suggest to push a branch to the Emacs repository with this code, it
will allow more people to try that.

Thanks.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Embedded modifiers in the regex engine
  2016-04-23  7:48                               ` Eli Zaretskii
@ 2016-04-24  7:34                                 ` Dima Kogan
  0 siblings, 0 replies; 22+ messages in thread
From: Dima Kogan @ 2016-04-24  7:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Dima Kogan <lists@dima.secretsauce.net>
>> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
>> Date: Fri, 22 Apr 2016 22:17:08 -0600
>> 
>> Sorry for the delay. An initial implementation of the case-fold embedded
>> modifiers lives at
>> 
>>   https://github.com/dkogan/emacs-snapshot/tree/regex_embedded_modifiers
>
> I suggest to push a branch to the Emacs repository with this code, it
> will allow more people to try that.

OK. The code is in a new "dima_regex_embedded_modifiers" branch in the
Emacs repository.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-04-24  7:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-25  1:32 Embedded modifiers in the regex engine Dima Kogan
2016-02-25  6:11 ` John Wiegley
2016-02-25 21:05   ` Stefan Monnier
2016-02-25 16:15 ` Eli Zaretskii
2016-02-26  7:19   ` Dima Kogan
2016-02-26  9:08     ` Eli Zaretskii
2016-02-28  1:50       ` Dima Kogan
2016-02-28 16:00         ` Eli Zaretskii
2016-02-29 13:30           ` Richard Stallman
2016-03-01  0:49             ` Aurélien Aptel
2016-03-01 16:55               ` Richard Stallman
2016-03-09  0:34                 ` Dima Kogan
2016-03-28  0:23                   ` Dima Kogan
2016-03-28 15:28                     ` Eli Zaretskii
2016-03-28 18:05                       ` Dima Kogan
2016-03-28 18:24                         ` Eli Zaretskii
2016-03-29 21:23                           ` Dima Kogan
2016-03-31 16:28                             ` Eli Zaretskii
2016-04-23  4:17                             ` Dima Kogan
2016-04-23  7:48                               ` Eli Zaretskii
2016-04-24  7:34                                 ` Dima Kogan
2016-03-28 16:09                     ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).