From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH] In string-split, add support for character sets and predicates. Date: Tue, 09 Oct 2012 13:48:25 -0400 Message-ID: <874nm3mt92.fsf@tines.lan> References: <87sj9pm0oz.fsf@tines.lan> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1349804976 15905 80.91.229.3 (9 Oct 2012 17:49:36 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 9 Oct 2012 17:49:36 +0000 (UTC) Cc: guile-devel@gnu.org To: Daniel Hartwig Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Oct 09 19:49:41 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TLdw0-0001SY-Un for guile-devel@m.gmane.org; Tue, 09 Oct 2012 19:49:37 +0200 Original-Received: from localhost ([::1]:38093 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLdvu-0002tb-Mr for guile-devel@m.gmane.org; Tue, 09 Oct 2012 13:49:30 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:44012) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLdve-0002WO-O0 for guile-devel@gnu.org; Tue, 09 Oct 2012 13:49:28 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLdvJ-0002U9-Ap for guile-devel@gnu.org; Tue, 09 Oct 2012 13:49:13 -0400 Original-Received: from world.peace.net ([96.39.62.75]:49388) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLdvJ-0002KF-4T for guile-devel@gnu.org; Tue, 09 Oct 2012 13:48:53 -0400 Original-Received: from 209-6-91-212.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com ([209.6.91.212] helo=tines.lan) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1TLduy-000751-QG; Tue, 09 Oct 2012 13:48:33 -0400 In-Reply-To: (Daniel Hartwig's message of "Tue, 9 Oct 2012 11:34:46 +0800") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14946 Archived-At: Hi Daniel, The updated patch looks good except for a few minor nitpicks: Daniel Hartwig writes: > From 46178db9eecc9ca402d9571c3ee520074f15ef5a Mon Sep 17 00:00:00 2001 > From: Daniel Hartwig > Date: Mon, 8 Oct 2012 18:35:00 +0800 > Subject: [PATCH] In string-split, add support for character sets and > predicates. > > * libguile/srfi-13.c (string-split): Add support for splitting on > character sets and predicates, like string-index and others. > * test-suite/tests/strings.test (string-split): Add tests covering > the new argument types. > --- > libguile/srfi-13.c | 102 +++++++++++++++++++++++++++++------------ > libguile/srfi-13.h | 2 +- > test-suite/tests/strings.test | 62 ++++++++++++++++++++++++- > 3 files changed, 134 insertions(+), 32 deletions(-) > > diff --git a/libguile/srfi-13.c b/libguile/srfi-13.c > index 2834553..605ba21 100644 > --- a/libguile/srfi-13.c > +++ b/libguile/srfi-13.c > @@ -2993,11 +2993,22 @@ SCM_DEFINE (scm_string_tokenize, "string-tokenize", 1, 3, 0, > #undef FUNC_NAME > > SCM_DEFINE (scm_string_split, "string-split", 2, 0, 0, > - (SCM str, SCM chr), > + (SCM str, SCM char_pred), > "Split the string @var{str} into a list of the substrings delimited\n" > - "by appearances of the character @var{chr}. Note that an empty substring\n" > - "between separator characters will result in an empty string in the\n" > - "result list.\n" > + "by appearances of characters that\n" > + "\n" > + "@itemize @bullet\n" > + "@item\n" > + "equal @var{char_pred}, if it is a character,\n" > + "\n" > + "@item\n" > + "satisfy the predicate @var{char_pred}, if it is a procedure,\n" > + "\n" > + "@item\n" > + "are in the set @var{char_pred}, if it is a character set.\n" > + "@end itemize\n\n" > + "Note that an empty substring between separator characters\n" > + "will result in an empty string in the result list.\n" > "\n" > "@lisp\n" > "(string-split \"root:x:0:0:root:/root:/bin/bash\" #\\:)\n" > @@ -3014,47 +3025,78 @@ SCM_DEFINE (scm_string_split, "string-split", 2, 0, 0, > "@end lisp") > #define FUNC_NAME s_scm_string_split > { > - long idx, last_idx; > - int narrow; > SCM res = SCM_EOL; > > SCM_VALIDATE_STRING (1, str); > - SCM_VALIDATE_CHAR (2, chr); > > - /* This is explicit wide/narrow logic (instead of using > - scm_i_string_ref) is a speed optimization. */ > - idx = scm_i_string_length (str); > - narrow = scm_i_is_narrow_string (str); > - if (narrow) > + if (SCM_CHARP (char_pred)) > { > - const char *buf = scm_i_string_chars (str); > - while (idx >= 0) > + long idx, last_idx; > + int narrow; > + > + /* This is explicit wide/narrow logic (instead of using > + scm_i_string_ref) is a speed optimization. */ > + idx = scm_i_string_length (str); > + narrow = scm_i_is_narrow_string (str); > + if (narrow) > { > - last_idx = idx; > - while (idx > 0 && buf[idx-1] != (char) SCM_CHAR(chr)) > - idx--; > - if (idx >= 0) > + const char *buf = scm_i_string_chars (str); > + while (idx >= 0) > { > - res = scm_cons (scm_i_substring (str, idx, last_idx), res); > - idx--; > + last_idx = idx; > + while (idx > 0 && buf[idx-1] != (char) SCM_CHAR(char_pred)) > + idx--; > + if (idx >= 0) > + { > + res = scm_cons (scm_i_substring (str, idx, last_idx), res); > + idx--; > + } > } > } > + else > + { > + const scm_t_wchar *buf = scm_i_string_wide_chars (str); > + while (idx >= 0) > + { > + last_idx = idx; > + while (idx > 0 && buf[idx-1] != SCM_CHAR(char_pred)) > + idx--; > + if (idx >= 0) > + { > + res = scm_cons (scm_i_substring (str, idx, last_idx), res); > + idx--; > + } > + } > + } > + goto done; This 'goto done' is a no-op. Please remove it. > } > else > { > - const scm_t_wchar *buf = scm_i_string_wide_chars (str); > - while (idx >= 0) > + SCM sidx, slast_idx; > + > + if (!SCM_CHARSETP (char_pred)) > { > - last_idx = idx; > - while (idx > 0 && buf[idx-1] != SCM_CHAR(chr)) > - idx--; > - if (idx >= 0) > - { > - res = scm_cons (scm_i_substring (str, idx, last_idx), res); > - idx--; > - } > + SCM_ASSERT (scm_is_true (scm_procedure_p (char_pred)), > + char_pred, SCM_ARG2, FUNC_NAME); > + } Please drop the unneeded curly braces above. > + > + /* Supporting predicates and character sets involves handling SCM > + values so there is less chance to optimize. */ > + slast_idx = scm_string_length (str); > + for (;;) > + { > + sidx = scm_string_index_right (str, char_pred, SCM_INUM0, slast_idx); > + if (scm_is_false (sidx)) > + break; > + res = scm_cons (scm_substring (str, scm_oneplus (sidx), slast_idx), res); > + slast_idx = sidx; > } > + > + res = scm_cons (scm_substring (str, SCM_INUM0, slast_idx), res); > + goto done; This 'goto done' is a no-op. Please remove it. > } > + > + done: and remove this label. > scm_remember_upto_here_1 (str); > return res; > } After these changes, I think the patch will be ready to push, unless anyone else has suggestions. Thanks! Mark