* bug#36251: Regex library doesn't recognize ']' in a character class
@ 2019-06-16 17:16 Abdulrahman Semrie
2019-06-16 19:40 ` tomas
2019-06-18 11:08 ` Mark H Weaver
0 siblings, 2 replies; 5+ messages in thread
From: Abdulrahman Semrie @ 2019-06-16 17:16 UTC (permalink / raw)
To: 36251
[-- Attachment #1: Type: text/plain, Size: 764 bytes --]
I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or right bracket in it. However, the string-match function doesn’t match the ‘]’ character. To demonstrate with an example, try the following funciton:
(string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)
The result for the above function should have been a match structure with Text[ab] matched. However, the string-match returns #f which is incorrect. To test if the pattern I am using was right, I tried on regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is the link that demonstrates that it works.
Hence, the above leads me to believe there is a bug in the regex library that mishandles ] character in character-classes
—
Regards,
Abdulrahman Semrie
[-- Attachment #2: Type: text/html, Size: 1546 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#36251: Regex library doesn't recognize ']' in a character class
2019-06-16 17:16 bug#36251: Regex library doesn't recognize ']' in a character class Abdulrahman Semrie
@ 2019-06-16 19:40 ` tomas
2019-06-18 11:08 ` Mark H Weaver
1 sibling, 0 replies; 5+ messages in thread
From: tomas @ 2019-06-16 19:40 UTC (permalink / raw)
To: 36251
[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]
On Sun, Jun 16, 2019 at 08:16:29PM +0300, Abdulrahman Semrie wrote:
>
> I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or right bracket in it. However, the string-match function doesn’t match the ‘]’ character. To demonstrate with an example, try the following funciton:
>
> (string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)
>
> The result for the above function should have been a match structure with Text[ab] matched. However, the string-match returns #f which is incorrect. To test if the pattern I am using was right, I tried on regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is the link that demonstrates that it works.
>
> Hence, the above leads me to believe there is a bug in the regex library that mishandles ] character in character-classes
If I understood you correctly, you are using POSIX regular
expressions. Within a bracket expression ([...]), you can't
escape ']' with a backslash. Just put the ] as first character,
like so:
[][a-zA-Z]
Quoting the man page (regex(7)):
A bracket expression is a list of characters enclosed in "[]".
It normally matches any single character from the list (but see
below). If the list begins with '^', it matches any single
character (but see below) not from the rest of the list. [...]
To include a literal ']' in the list, make it the first
character (following a possible '^'). To include a literal
'-', make it the first or last character, or the second endpoint
of a range [...]
See also [1], but the man page is more complete.
(I'm assuming your Guile is linked against some POSIX regex library).
Cheers
-- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#36251: Regex library doesn't recognize ']' in a character class
2019-06-16 17:16 bug#36251: Regex library doesn't recognize ']' in a character class Abdulrahman Semrie
2019-06-16 19:40 ` tomas
@ 2019-06-18 11:08 ` Mark H Weaver
2019-06-18 11:20 ` tomas
2019-06-28 11:21 ` David Pirotte
1 sibling, 2 replies; 5+ messages in thread
From: Mark H Weaver @ 2019-06-18 11:08 UTC (permalink / raw)
To: Abdulrahman Semrie; +Cc: 36251
Hi,
Abdulrahman Semrie <hsamireh@gmail.com> writes:
> I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or
> right bracket in it. However, the string-match function doesn’t match
> the ‘]’ character. To demonstrate with an example, try the following
> funciton:
>
> (string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)
>
> The result for the above function should have been a match structure
> with Text[ab] matched. However, the string-match returns #f which is
> incorrect. To test if the pattern I am using was right, I tried on
> regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is
> the link that demonstrates that it works.
It turns out that there are several flavors of regular expressions in
common use, with different features and syntax. The link you provided
is using PCRE (PHP) regular expressions (see the "flavor" pane on the
left), and there are three other supported flavors on that web site.
Guile's (ice-9 regex) module provides a simpler flavor of regexps known
as "POSIX extended regular expressions", implemented as a thin wrapper
around your system's POSIX regular expression library ('regcomp' and
'regexec'). The web site you referenced does not appear to support
POSIX extended regular expressions, but here are some links about them:
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04
One of the notable differences is that in POSIX extended regular
expressions, character classes do not support backslash escapes, but
instead use a more ad-hoc approach as <tomas@tuxteam.de> described.
Regards,
Mark
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#36251: Regex library doesn't recognize ']' in a character class
2019-06-18 11:08 ` Mark H Weaver
@ 2019-06-18 11:20 ` tomas
2019-06-28 11:21 ` David Pirotte
1 sibling, 0 replies; 5+ messages in thread
From: tomas @ 2019-06-18 11:20 UTC (permalink / raw)
To: Mark H Weaver; +Cc: Abdulrahman Semrie, 36251
[-- Attachment #1: Type: text/plain, Size: 855 bytes --]
On Tue, Jun 18, 2019 at 07:08:06AM -0400, Mark H Weaver wrote:
> Hi,
>
> Abdulrahman Semrie <hsamireh@gmail.com> writes:
>
> > I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or
> > right bracket in it [...]
> It turns out that there are several flavors of regular expressions in
> common use, with different features and syntax. The link you provided
> is using PCRE (PHP) regular expressions (see the "flavor" pane on the
> left), and there are three other supported flavors on that web site.
>
> Guile's (ice-9 regex) module provides a simpler flavor of regexps known
> as "POSIX extended regular expressions" [...]
D'oh! I forgot about Perl compatible regexps. In those, you /can/ escape
things with a backslash whithin [...]. This would have explained Abdulrhaman's
confusion better.
Thanks, Mark
-- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#36251: Regex library doesn't recognize ']' in a character class
2019-06-18 11:08 ` Mark H Weaver
2019-06-18 11:20 ` tomas
@ 2019-06-28 11:21 ` David Pirotte
1 sibling, 0 replies; 5+ messages in thread
From: David Pirotte @ 2019-06-28 11:21 UTC (permalink / raw)
To: Mark H Weaver; +Cc: Abdulrahman Semrie, 36251
[-- Attachment #1: Type: text/plain, Size: 796 bytes --]
Hello,
> ...
> It turns out that there are several flavors of regular expressions in
> common use, with different features and syntax. The link you provided
> is using PCRE (PHP) regular expressions (see the "flavor" pane on the
> left), and there are three other supported flavors on that web site.
> ...
Fwiw, I just came across a pcre binding for guile(*), here:
https://github.com/NalaGinrut/guile-pcre-ffi
I didn't try it and I have no idea about the general quality and robustness of the
binding, last updated 4y ago it seems, but the code is really small, uses the ffi,
so it should be quite easy to patch if necessary and may be fun to 'resurrect' ...
David
(*) I found it while looking for something else, here:
http://sph.mn/foreign/guile-software.html
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-06-28 11:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-16 17:16 bug#36251: Regex library doesn't recognize ']' in a character class Abdulrahman Semrie
2019-06-16 19:40 ` tomas
2019-06-18 11:08 ` Mark H Weaver
2019-06-18 11:20 ` tomas
2019-06-28 11:21 ` David Pirotte
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).