From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kevin Ryde Newsgroups: gmane.lisp.guile.devel Subject: regexp-quote bugs Date: Sun, 22 Aug 2004 11:43:23 +1000 Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Message-ID: <87brh42bic.fsf@zip.com.au> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1093139030 31519 80.91.224.253 (22 Aug 2004 01:43:50 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 22 Aug 2004 01:43:50 +0000 (UTC) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Aug 22 03:43:45 2004 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ByhOb-0006Xz-00 for ; Sun, 22 Aug 2004 03:43:45 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1ByhSy-00089F-OF for guile-devel@m.gmane.org; Sat, 21 Aug 2004 21:48:16 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1ByhSv-00088u-46 for guile-devel@gnu.org; Sat, 21 Aug 2004 21:48:13 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1ByhSu-00088e-9a for guile-devel@gnu.org; Sat, 21 Aug 2004 21:48:12 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1ByhSu-00088U-7N for guile-devel@gnu.org; Sat, 21 Aug 2004 21:48:12 -0400 Original-Received: from [61.8.0.84] (helo=mailout1.pacific.net.au) by monty-python.gnu.org with esmtp (Exim 4.34) id 1ByhOR-0000GZ-5U for guile-devel@gnu.org; Sat, 21 Aug 2004 21:43:35 -0400 Original-Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87]) by mailout1.pacific.net.au (8.12.3/8.12.3/Debian-6.6) with ESMTP id i7M1hY4u001277 for ; Sun, 22 Aug 2004 11:43:34 +1000 Original-Received: from localhost (ppp202F.dyn.pacific.net.au [61.8.32.47]) by mailproxy2.pacific.net.au (8.12.3/8.12.3/Debian-6.6) with ESMTP id i7M1hW78026465 for ; Sun, 22 Aug 2004 11:43:33 +1000 Original-Received: from gg by localhost with local (Exim 3.36 #1 (Debian)) id 1ByhOF-0001JD-00; Sun, 22 Aug 2004 11:43:23 +1000 Original-To: guile-devel@gnu.org Mail-Copies-To: never User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3 (gnu/linux) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.devel:4005 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:4005 * regex-posix.c (scm_regexp_quote): Rewrite of ice-9 regex regexp-quote in C. Fix [ and |, they must be quoted. Fix quoting of ( ) { + ? for regexp/basic, must use char class [(] etc since \( in fact them become special. This is for 1.6 too. I'm assuming regexp-quote is meant to quote for both regexp/basic and regexp/extended usages. At the moment it's got problems in both. New code below (have to find the posix spec to double check what needs to be done, but this works with glibc for a start). SCM_DEFINE (scm_regexp_quote, "regexp-quote", 1, 0, 0, (SCM str), "Return a regexp string which matches @var{str} literally, ie.@:\n" "any characters like @samp{*} in @var{str} which are special in\n" "a regexp are quoted. If there are no special characters then\n" "@var{str} itself is returned.\n" "\n" "The regexp returned can be used with both @code{regexp/basic}\n" "and @code{regexp/extended}, the quoting applied is safe for\n" "both styles.") #define FUNC_NAME s_scm_regexp_quote { size_t i, j, len, newlen; const char *ptr; char *newptr; SCM newstr; SCM_VALIDATE_STRING (SCM_ARG1, str); ptr = scm_i_string_chars (str); len = scm_i_string_length (str); /* [ * . \ ^ and $ are special in both regexp/basic and regexp/extended and can be backslash escaped. ( ) { } + ? and | are special in regexp/extended so must be escaped. But that can't be done with a backslash since in regexp/basic sequences \( \) \{ \} \+ \? and \| are special. Character class forms [(] etc are used instead. ] is not special outside a [ ] character class, so doesn't need to be escaped. */ #define REGEXP_QUOTE_BACKSLASH \ case '[': \ case '*': \ case '.': \ case '\\': \ case '^': \ case '$' #define REGEXP_QUOTE_CHARCLASS \ case '(': \ case ')': \ case '{': \ case '}': \ case '+': \ case '?': \ case '|' for (i = 0, newlen = 0; i < len; i++) { switch (ptr[i]) { REGEXP_QUOTE_BACKSLASH: newlen += 2; break; REGEXP_QUOTE_CHARCLASS: newlen += 3; break; default: newlen += 1; break; } } if (newlen == len) return str; newstr = scm_i_make_string (newlen, &newptr); for (i = 0, j = 0; i < len; i++, j++) { char c = ptr[i]; switch (c) { REGEXP_QUOTE_BACKSLASH: newptr[j++] = '\\'; goto store_c; REGEXP_QUOTE_CHARCLASS: newptr[j++] = '['; newptr[j++] = c; newptr[j] = ']'; break; default: store_c: newptr[j] = c; break; } } scm_remember_upto_here_1 (str); return newstr; } #undef FUNC_NAME _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel