unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* regexp-quote bugs
@ 2004-08-22  1:43 Kevin Ryde
  2004-08-24 17:23 ` Marius Vollmer
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Ryde @ 2004-08-22  1:43 UTC (permalink / raw)


        * regex-posix.c (scm_regexp_quote): Rewrite of ice-9 regex
        regexp-quote in C.  Fix [ and |, they must be quoted.  Fix quoting of
        ( ) { + ? for regexp/basic, must use char class [(] etc since \( in
        fact them become special.

This is for 1.6 too.

I'm assuming regexp-quote is meant to quote for both regexp/basic and
regexp/extended usages.  At the moment it's got problems in both.

New code below (have to find the posix spec to double check what needs
to be done, but this works with glibc for a start).



SCM_DEFINE (scm_regexp_quote, "regexp-quote", 1, 0, 0,
            (SCM str),
	    "Return a regexp string which matches @var{str} literally, ie.@:\n"
	    "any characters like @samp{*} in @var{str} which are special in\n"
	    "a regexp are quoted.  If there are no special characters then\n"
	    "@var{str} itself is returned.\n"
	    "\n"
	    "The regexp returned can be used with both @code{regexp/basic}\n"
	    "and @code{regexp/extended}, the quoting applied is safe for\n"
	    "both styles.")
#define FUNC_NAME s_scm_regexp_quote
{
  size_t     i, j, len, newlen;
  const char *ptr;
  char       *newptr;
  SCM        newstr;

  SCM_VALIDATE_STRING (SCM_ARG1, str);
  ptr = scm_i_string_chars (str);
  len = scm_i_string_length (str);

  /* [ * . \ ^ and $ are special in both regexp/basic and regexp/extended
     and can be backslash escaped.

     ( ) { } + ? and | are special in regexp/extended so must be escaped.
     But that can't be done with a backslash since in regexp/basic sequences
     \( \) \{ \} \+ \? and \| are special.  Character class forms [(] etc
     are used instead.

     ] is not special outside a [ ] character class, so doesn't need to be
     escaped.  */

#define REGEXP_QUOTE_BACKSLASH                  \
  case '[':                                     \
 case '*':                                      \
 case '.':                                      \
 case '\\':                                     \
 case '^':                                      \
 case '$'

#define REGEXP_QUOTE_CHARCLASS                  \
  case '(':                                     \
 case ')':                                      \
 case '{':                                      \
 case '}':                                      \
 case '+':                                      \
 case '?':                                      \
 case '|'

  for (i = 0, newlen = 0; i < len; i++)
    {
      switch (ptr[i]) {
      REGEXP_QUOTE_BACKSLASH:
        newlen += 2;
        break;
      REGEXP_QUOTE_CHARCLASS:
        newlen += 3;
        break;
      default:
        newlen += 1;
        break;
      }
    }

  if (newlen == len)
    return str;

  newstr = scm_i_make_string (newlen, &newptr);
  for (i = 0, j = 0; i < len; i++, j++)
    {
      char c = ptr[i];
      switch (c) {
      REGEXP_QUOTE_BACKSLASH:
        newptr[j++] = '\\';
        goto store_c;
      REGEXP_QUOTE_CHARCLASS:
        newptr[j++] = '[';
        newptr[j++] = c;
        newptr[j] = ']';
        break;
      default:
      store_c:
        newptr[j] = c;
        break;
      }
    }
  scm_remember_upto_here_1 (str);
  return newstr;
}
#undef FUNC_NAME


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-09-07 16:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-22  1:43 regexp-quote bugs Kevin Ryde
2004-08-24 17:23 ` Marius Vollmer
2004-08-25  1:15   ` Kevin Ryde
2004-09-07 16:09     ` Marius Vollmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).