* regexp-quote bugs
@ 2004-08-22 1:43 Kevin Ryde
2004-08-24 17:23 ` Marius Vollmer
0 siblings, 1 reply; 4+ messages in thread
From: Kevin Ryde @ 2004-08-22 1:43 UTC (permalink / raw)
* regex-posix.c (scm_regexp_quote): Rewrite of ice-9 regex
regexp-quote in C. Fix [ and |, they must be quoted. Fix quoting of
( ) { + ? for regexp/basic, must use char class [(] etc since \( in
fact them become special.
This is for 1.6 too.
I'm assuming regexp-quote is meant to quote for both regexp/basic and
regexp/extended usages. At the moment it's got problems in both.
New code below (have to find the posix spec to double check what needs
to be done, but this works with glibc for a start).
SCM_DEFINE (scm_regexp_quote, "regexp-quote", 1, 0, 0,
(SCM str),
"Return a regexp string which matches @var{str} literally, ie.@:\n"
"any characters like @samp{*} in @var{str} which are special in\n"
"a regexp are quoted. If there are no special characters then\n"
"@var{str} itself is returned.\n"
"\n"
"The regexp returned can be used with both @code{regexp/basic}\n"
"and @code{regexp/extended}, the quoting applied is safe for\n"
"both styles.")
#define FUNC_NAME s_scm_regexp_quote
{
size_t i, j, len, newlen;
const char *ptr;
char *newptr;
SCM newstr;
SCM_VALIDATE_STRING (SCM_ARG1, str);
ptr = scm_i_string_chars (str);
len = scm_i_string_length (str);
/* [ * . \ ^ and $ are special in both regexp/basic and regexp/extended
and can be backslash escaped.
( ) { } + ? and | are special in regexp/extended so must be escaped.
But that can't be done with a backslash since in regexp/basic sequences
\( \) \{ \} \+ \? and \| are special. Character class forms [(] etc
are used instead.
] is not special outside a [ ] character class, so doesn't need to be
escaped. */
#define REGEXP_QUOTE_BACKSLASH \
case '[': \
case '*': \
case '.': \
case '\\': \
case '^': \
case '$'
#define REGEXP_QUOTE_CHARCLASS \
case '(': \
case ')': \
case '{': \
case '}': \
case '+': \
case '?': \
case '|'
for (i = 0, newlen = 0; i < len; i++)
{
switch (ptr[i]) {
REGEXP_QUOTE_BACKSLASH:
newlen += 2;
break;
REGEXP_QUOTE_CHARCLASS:
newlen += 3;
break;
default:
newlen += 1;
break;
}
}
if (newlen == len)
return str;
newstr = scm_i_make_string (newlen, &newptr);
for (i = 0, j = 0; i < len; i++, j++)
{
char c = ptr[i];
switch (c) {
REGEXP_QUOTE_BACKSLASH:
newptr[j++] = '\\';
goto store_c;
REGEXP_QUOTE_CHARCLASS:
newptr[j++] = '[';
newptr[j++] = c;
newptr[j] = ']';
break;
default:
store_c:
newptr[j] = c;
break;
}
}
scm_remember_upto_here_1 (str);
return newstr;
}
#undef FUNC_NAME
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: regexp-quote bugs
2004-08-22 1:43 regexp-quote bugs Kevin Ryde
@ 2004-08-24 17:23 ` Marius Vollmer
2004-08-25 1:15 ` Kevin Ryde
0 siblings, 1 reply; 4+ messages in thread
From: Marius Vollmer @ 2004-08-24 17:23 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> * regex-posix.c (scm_regexp_quote): Rewrite of ice-9 regex
> regexp-quote in C. Fix [ and |, they must be quoted. Fix quoting of
> ( ) { + ? for regexp/basic, must use char class [(] etc since \( in
> fact them become special.
>
> This is for 1.6 too.
Yes.
Why did you rewrite it in C?
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: regexp-quote bugs
2004-08-24 17:23 ` Marius Vollmer
@ 2004-08-25 1:15 ` Kevin Ryde
2004-09-07 16:09 ` Marius Vollmer
0 siblings, 1 reply; 4+ messages in thread
From: Kevin Ryde @ 2004-08-25 1:15 UTC (permalink / raw)
Cc: guile-devel
Marius Vollmer <marius.vollmer@uni-dortmund.de> writes:
>
> Why did you rewrite it in C?
I figured it'd be a good speedup. Might just chuck something into the
scheme code to start with though.
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: regexp-quote bugs
2004-08-25 1:15 ` Kevin Ryde
@ 2004-09-07 16:09 ` Marius Vollmer
0 siblings, 0 replies; 4+ messages in thread
From: Marius Vollmer @ 2004-09-07 16:09 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> Marius Vollmer <marius.vollmer@uni-dortmund.de> writes:
>>
>> Why did you rewrite it in C?
>
> I figured it'd be a good speedup.
Yes, maybe, but with Unicode coming up and all that, it might be
easier to keep it in Scheme. By using the internal C string
interface, we have one more blob of non-trivial code to maintain when
that interface changes.
(The code right now is subtly buggy, already: it uses the result of
scm_i_string_chars across scm_i_make_string and the latter allows
other threads to change the internal representation of strings, thus
invalidating the pointer returned from scm_i_string_chars.)
> Might just chuck something into the scheme code to start with
> though.
Yes, I think that would be best right now. Could you make the quating
changes to the Scheme code, both 1.7 and 1.6? Thanks!
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-09-07 16:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-22 1:43 regexp-quote bugs Kevin Ryde
2004-08-24 17:23 ` Marius Vollmer
2004-08-25 1:15 ` Kevin Ryde
2004-09-07 16:09 ` Marius Vollmer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).