From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Matthias Koeppe Newsgroups: gmane.lisp.guile.devel Subject: Re: [Patch] SRFI-13 string-tokenize is wrong Date: Fri, 26 Apr 2002 10:27:50 +0200 Sender: guile-devel-admin@gnu.org Message-ID: References: <87bsc8oom8.fsf@zagadka.ping.de> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1019809845 13989 127.0.0.1 (26 Apr 2002 08:30:45 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Fri, 26 Apr 2002 08:30:45 +0000 (UTC) Cc: Martin Grabmueller , guile-devel@gnu.org, haus@lambda.math.uni-magdeburg.de Return-path: Original-Received: from fencepost.gnu.org ([199.232.76.164]) by main.gmane.org with esmtp (Exim 3.33 #1 (Debian)) id 17117t-0003dV-00 for ; Fri, 26 Apr 2002 10:30:45 +0200 Original-Received: from localhost ([127.0.0.1] helo=fencepost.gnu.org) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 17117Z-0006Ul-00; Fri, 26 Apr 2002 04:30:25 -0400 Original-Received: from saturn.math.uni-magdeburg.de ([141.44.75.38]) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 171159-0006Mp-00 for ; Fri, 26 Apr 2002 04:27:55 -0400 Original-Received: from lambda.math.uni-magdeburg.de (lambda [141.44.75.79]) by saturn.math.uni-magdeburg.de (8.11.6/8.11.6) with ESMTP id g3Q8RqJ10067; Fri, 26 Apr 2002 10:27:52 +0200 (MET DST) Original-Received: (from mkoeppe@localhost) by lambda.math.uni-magdeburg.de (8.10.2+Sun/8.10.2) id g3Q8RpT10411; Fri, 26 Apr 2002 10:27:51 +0200 (MEST) X-Authentication-Warning: lambda.math.uni-magdeburg.de: mkoeppe set sender to mkoeppe@mail.math.uni-magdeburg.de using -f Original-To: Marius Vollmer In-Reply-To: <87bsc8oom8.fsf@zagadka.ping.de> (Marius Vollmer's message of "24 Apr 2002 21:58:23 +0200") Original-Lines: 138 User-Agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1.80 (sparc-sun-solaris2.7) Errors-To: guile-devel-admin@gnu.org X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.0.9 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Developers list for Guile, the GNU extensibility library List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.lisp.guile.devel:527 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:527 Marius Vollmer writes: > Matthias Koeppe writes: >> the Guile implementation of SRFI-13 `string-tokenize' gets the meaning >> of the `token-set' argument wrong. > > Yep, good catch. However, your patch did not make an unspecified > token-set equivalent to char-set:graphic, but to (char-set-complement > char-set:whitespace). Thanks for pointing this out. I've updated the patch; see below. Now this works correctly: (string-tokenize "Help make programs\arun, run,\nRUN!") ==> ("Help" "make" "programs" "run," "run," "RUN!") -- Matthias Index: srfi-13.c =================================================================== RCS file: /cvs/guile/guile-core/srfi/srfi-13.c,v retrieving revision 1.11.2.5 diff -u -u -r1.11.2.5 srfi-13.c --- srfi-13.c 14 Mar 2002 05:32:48 -0000 1.11.2.5 +++ srfi-13.c 26 Apr 2002 08:20:04 -0000 @@ -2798,13 +2798,14 @@ SCM_DEFINE (scm_string_tokenize, "string-tokenize", 1, 3, 0, - (SCM s, SCM token_char, SCM start, SCM end), + (SCM s, SCM token_set, SCM start, SCM end), "Split the string @var{s} into a list of substrings, where each\n" "substring is a maximal non-empty contiguous sequence of\n" - "characters equal to the character @var{token_char}, or\n" - "whitespace, if @var{token_char} is not given. If\n" - "@var{token_char} is a character set, it is used for finding the\n" - "token borders.") + "characters from the character set @var{token_set}, which\n" + "defaults to an equivalent of @code{char-set:graphic}.\n" + "If @var{start} or @var{end} indices are provided, they restrict\n" + "@code{string-tokenize} to operating on the indicated substring\n" + "of @var{s}.") #define FUNC_NAME s_scm_string_tokenize { char * cstr; @@ -2814,7 +2815,7 @@ SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, s, cstr, 3, start, cstart, 4, end, cend); - if (SCM_UNBNDP (token_char)) + if (SCM_UNBNDP (token_set)) { int idx; @@ -2822,7 +2823,7 @@ { while (cstart < cend) { - if (!isspace (cstr[cend - 1])) + if (isgraph (cstr[cend - 1])) break; cend--; } @@ -2831,14 +2832,14 @@ idx = cend; while (cstart < cend) { - if (isspace (cstr[cend - 1])) + if (!isgraph (cstr[cend - 1])) break; cend--; } result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result); } } - else if (SCM_CHARSETP (token_char)) + else if (SCM_CHARSETP (token_set)) { int idx; @@ -2846,7 +2847,7 @@ { while (cstart < cend) { - if (!SCM_CHARSET_GET (token_char, cstr[cend - 1])) + if (SCM_CHARSET_GET (token_set, cstr[cend - 1])) break; cend--; } @@ -2855,41 +2856,14 @@ idx = cend; while (cstart < cend) { - if (SCM_CHARSET_GET (token_char, cstr[cend - 1])) - break; - cend--; - } - result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result); - } - } - else - { - int idx; - char chr; - - SCM_VALIDATE_CHAR (2, token_char); - chr = SCM_CHAR (token_char); - - while (cstart < cend) - { - while (cstart < cend) - { - if (cstr[cend - 1] != chr) - break; - cend--; - } - if (cstart >= cend) - break; - idx = cend; - while (cstart < cend) - { - if (cstr[cend - 1] == chr) + if (!SCM_CHARSET_GET (token_set, cstr[cend - 1])) break; cend--; } result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result); } } + else SCM_WRONG_TYPE_ARG (2, token_set); return result; } #undef FUNC_NAME -- Matthias Koeppe -- http://www.math.uni-magdeburg.de/~mkoeppe _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel