unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Matthias Koeppe <mkoeppe@mail.Math.Uni-Magdeburg.De>
Cc: Martin Grabmueller <mgrabmue@cs.tu-berlin.de>,
	guile-devel@gnu.org, haus@lambda.math.uni-magdeburg.de
Subject: Re: [Patch] SRFI-13 string-tokenize is wrong
Date: Fri, 26 Apr 2002 10:27:50 +0200	[thread overview]
Message-ID: <uw5vgaedfuh.fsf@lambda.math.uni-magdeburg.de> (raw)
In-Reply-To: <87bsc8oom8.fsf@zagadka.ping.de> (Marius Vollmer's message of "24 Apr 2002 21:58:23 +0200")

Marius Vollmer <mvo@zagadka.ping.de> writes:

> Matthias Koeppe <mkoeppe@mail.Math.Uni-Magdeburg.De> writes:
>> the Guile implementation of SRFI-13 `string-tokenize' gets the meaning
>> of the `token-set' argument wrong.
>
> Yep, good catch.  However, your patch did not make an unspecified
> token-set equivalent to char-set:graphic, but to (char-set-complement
> char-set:whitespace).

Thanks for pointing this out.  I've updated the patch; see below.

Now this works correctly:

 (string-tokenize "Help make programs\arun, run,\nRUN!")
 ==> ("Help" "make" "programs" "run," "run," "RUN!")

-- Matthias

Index: srfi-13.c
===================================================================
RCS file: /cvs/guile/guile-core/srfi/srfi-13.c,v
retrieving revision 1.11.2.5
diff -u -u -r1.11.2.5 srfi-13.c
--- srfi-13.c	14 Mar 2002 05:32:48 -0000	1.11.2.5
+++ srfi-13.c	26 Apr 2002 08:20:04 -0000
@@ -2798,13 +2798,14 @@
 
 
 SCM_DEFINE (scm_string_tokenize, "string-tokenize", 1, 3, 0,
-	    (SCM s, SCM token_char, SCM start, SCM end),
+	    (SCM s, SCM token_set, SCM start, SCM end),
 	    "Split the string @var{s} into a list of substrings, where each\n"
 	    "substring is a maximal non-empty contiguous sequence of\n"
-	    "characters equal to the character @var{token_char}, or\n"
-	    "whitespace, if @var{token_char} is not given.  If\n"
-	    "@var{token_char} is a character set, it is used for finding the\n"
-	    "token borders.")
+	    "characters from the character set @var{token_set}, which\n"
+	    "defaults to an equivalent of @code{char-set:graphic}.\n"
+	    "If @var{start} or @var{end} indices are provided, they restrict\n"
+	    "@code{string-tokenize} to operating on the indicated substring\n"
+	    "of @var{s}.")
 #define FUNC_NAME s_scm_string_tokenize
 {
   char * cstr;
@@ -2814,7 +2815,7 @@
   SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, s, cstr,
 				    3, start, cstart,
 				    4, end, cend);
-  if (SCM_UNBNDP (token_char))
+  if (SCM_UNBNDP (token_set))
     {
       int idx;
 
@@ -2822,7 +2823,7 @@
 	{
 	  while (cstart < cend)
 	    {
-	      if (!isspace (cstr[cend - 1]))
+	      if (isgraph (cstr[cend - 1]))
 		break;
 	      cend--;
 	    }
@@ -2831,14 +2832,14 @@
 	  idx = cend;
 	  while (cstart < cend)
 	    {
-	      if (isspace (cstr[cend - 1]))
+	      if (!isgraph (cstr[cend - 1]))
 		break;
 	      cend--;
 	    }
 	  result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result);
 	}
     }
-  else if (SCM_CHARSETP (token_char))
+  else if (SCM_CHARSETP (token_set))
     {
       int idx;
 
@@ -2846,7 +2847,7 @@
 	{
 	  while (cstart < cend)
 	    {
-	      if (!SCM_CHARSET_GET (token_char, cstr[cend - 1]))
+	      if (SCM_CHARSET_GET (token_set, cstr[cend - 1]))
 		break;
 	      cend--;
 	    }
@@ -2855,41 +2856,14 @@
 	  idx = cend;
 	  while (cstart < cend)
 	    {
-	      if (SCM_CHARSET_GET (token_char, cstr[cend - 1]))
-		break;
-	      cend--;
-	    }
-	  result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result);
-	}
-    }
-  else
-    {
-      int idx;
-      char chr;
-
-      SCM_VALIDATE_CHAR (2, token_char);
-      chr = SCM_CHAR (token_char);
-
-      while (cstart < cend)
-	{
-	  while (cstart < cend)
-	    {
-	      if (cstr[cend - 1] != chr)
-		break;
-	      cend--;
-	    }
-	  if (cstart >= cend)
-	    break;
-	  idx = cend;
-	  while (cstart < cend)
-	    {
-	      if (cstr[cend - 1] == chr)
+	      if (!SCM_CHARSET_GET (token_set, cstr[cend - 1]))
 		break;
 	      cend--;
 	    }
 	  result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result);
 	}
     }
+  else SCM_WRONG_TYPE_ARG (2, token_set);
   return result;
 }
 #undef FUNC_NAME

-- 
Matthias Koeppe -- http://www.math.uni-magdeburg.de/~mkoeppe


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


  reply	other threads:[~2002-04-26  8:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-03-12 17:35 [Patch] SRFI-13 string-tokenize is wrong Matthias Koeppe
2002-04-24 19:58 ` Marius Vollmer
2002-04-26  8:27   ` Matthias Koeppe [this message]
2002-04-26 18:18     ` Marius Vollmer
2002-04-29  9:21       ` Matthias Koeppe
2002-05-06 18:54         ` Marius Vollmer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=uw5vgaedfuh.fsf@lambda.math.uni-magdeburg.de \
    --to=mkoeppe@mail.math.uni-magdeburg.de \
    --cc=guile-devel@gnu.org \
    --cc=haus@lambda.math.uni-magdeburg.de \
    --cc=mgrabmue@cs.tu-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).