From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Vijay Marupudi Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH] Add string-split-substring Date: Sat, 12 Feb 2022 23:03:18 -0500 Message-ID: <87r187qupl.fsf@vijaymarupudi.com> References: <87tud3qxdo.fsf@vijaymarupudi.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31260"; mail-complaints-to="usenet@ciao.gmane.io" Cc: guile-devel To: Christopher Lam Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Feb 13 05:04:07 2022 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nJ67K-0007rg-UQ for guile-devel@m.gmane-mx.org; Sun, 13 Feb 2022 05:04:07 +0100 Original-Received: from localhost ([::1]:52390 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nJ67J-0008Cu-DJ for guile-devel@m.gmane-mx.org; Sat, 12 Feb 2022 23:04:05 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:52444) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJ674-0008Cf-MJ for guile-devel@gnu.org; Sat, 12 Feb 2022 23:03:50 -0500 Original-Received: from [2a0c:5a00:149::26] (port=56100 helo=mailtransmit05.runbox.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJ671-0003Ne-QS for guile-devel@gnu.org; Sat, 12 Feb 2022 23:03:50 -0500 Original-Received: from mailtransmit02.runbox ([10.9.9.162] helo=aibo.runbox.com) by mailtransmit05.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1nJ66w-005ZcB-UK; Sun, 13 Feb 2022 05:03:42 +0100 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=vijaymarupudi.com; s=selector1; h=Content-Type:MIME-Version:Message-ID:Date :References:In-Reply-To:Subject:Cc:To:From; bh=sJzUXJ+hm4aD5CAJF9NxoYSfHZhz0+eOPcRZ04iQ724=; b=XvsQmLV8CjDl1ONynJS2nQjtrZ 4Ty5Ugev7dvb7ub+6ThfUrSPSiCmQ6Df+m2DGC0F+Zo0Fc9E5w2YskRLkWnQ8hKWtOmldKoZmfEd3 u0RX12uld8Bh9jvPUrx38Ca3WJHq+Ge3HaziRCovKwmDzMZXWi5FMA+tKnnyXD2pl+WmAWnFuSW4B tH2Jw5uTx2AHpjeQn0GQPHSHkeg5Ft/OlFqsDyzlSYcE7YWXuZKHuicGbUvwssk1Ll/uirRY2onVe 8399fif4QLs0GntZgbOZsMp03QU2UGoHDaxG/FG+nXOuuSibn/5X6a+GbZ927w7nqyr4w112Qtwk5 jeqA3usw==; Original-Received: from [10.9.9.72] (helo=submission01.runbox) by mailtransmit02.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1nJ66w-0008PO-Ik; Sun, 13 Feb 2022 05:03:42 +0100 Original-Received: by submission01.runbox with esmtpsa [Authenticated ID (1028486)] (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) id 1nJ66d-0003f1-RT; Sun, 13 Feb 2022 05:03:24 +0100 In-Reply-To: X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a0c:5a00:149::26 (failed) Received-SPF: pass client-ip=2a0c:5a00:149::26; envelope-from=vijay@vijaymarupudi.com; helo=mailtransmit05.runbox.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:21107 Archived-At: --=-=-= Content-Type: text/plain Thanks for taking a look Christopher! Christopher Lam writes: > I think the last test should be: > > (pass-if "string-split-substring - non-empty, trailing delimiters" > (equal? (string-split-substring "barfoo" "foo") > (list "bar" "")))) Good catch, thank you! I have fixed it in the updated patch attached to this email. > And isn't it more efficient to use substring/shared instead of > substring? It might? However, it seems like that would violate the functional expection that modifying the strings returned from the function wouldn't change the string that was being split. Quoting the manual: > Scheme Procedure: substring/shared str start [end] > C Function: scm_substring_shared (str, start, end) > > Like substring, but the strings continue to share their storage > even if they are modified. Thus, modifications to str show up in > the new string, and vice versa. That seems like surprising behavior to me (when you don't know that they are shared). In case sharing is important, one can add a `string-split-substring/shared' procedure? ~ Vijay --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-Add-string-split-substring.patch >From 3c0940082ff49695ac9c2147c900b959be5f8e70 Mon Sep 17 00:00:00 2001 From: Vijay Marupudi Date: Sat, 12 Feb 2022 22:00:57 -0500 Subject: [PATCH] Add string-split-substring * /ref/api-data.texi: Added documentation * module/ice-9/string-fun.scm: Added implementation * test-suite/tests/strings.test: Added tests --- doc/ref/api-data.texi | 10 ++++++++++ module/ice-9/string-fun.scm | 23 ++++++++++++++++++++++- test-suite/tests/strings.test | 22 +++++++++++++++++++++- 3 files changed, 53 insertions(+), 2 deletions(-) diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi index 8658b9785..a5fcc47b1 100644 --- a/doc/ref/api-data.texi +++ b/doc/ref/api-data.texi @@ -4245,6 +4245,16 @@ Return a new string where every instance of @var{substring} in string @end lisp @end deffn +@deffn {Scheme Procedure} string-split-substring str substring +Split the string @var{str} into a list of substrings delimited by the +appearance of substring @var{substring}. For example: + +@lisp +(string-replace-substring "item-1::item-2::item-3::item-4" "::") +@result{} ("item-1" "item-2" "item-3" "item-4") +@end lisp +@end deffn + @node Representing Strings as Bytes @subsubsection Representing Strings as Bytes diff --git a/module/ice-9/string-fun.scm b/module/ice-9/string-fun.scm index 03e0238fa..a1d4c0366 100644 --- a/module/ice-9/string-fun.scm +++ b/module/ice-9/string-fun.scm @@ -26,7 +26,7 @@ separate-fields-before-char string-prefix-predicate string-prefix=? sans-surrounding-whitespace sans-trailing-whitespace sans-leading-whitespace sans-final-newline has-trailing-newline? - string-replace-substring)) + string-replace-substring string-split-substring)) ;;;; ;;; @@ -313,3 +313,24 @@ (else (display (substring/shared str start))))))))) +(define (string-split-substring str substr) + "Split the string @var{str} into a list of substrings delimited by the +substring @var{substr}." + + (define substrlen (string-length substr)) + (define strlen (string-length str)) + + (define (loop index start) + (cond + ((>= start strlen) (list "")) + ((not index) (list (substring str start))) + (else + (cons (substring str start index) + (let ((new-start (+ index substrlen))) + (loop (string-contains str substr new-start) + new-start)))))) + + (cond + ((string-contains str substr) => (lambda (idx) (loop idx 0))) + (else (list str)))) + diff --git a/test-suite/tests/strings.test b/test-suite/tests/strings.test index 7393bc8ec..8bc26e3e3 100644 --- a/test-suite/tests/strings.test +++ b/test-suite/tests/strings.test @@ -699,4 +699,24 @@ (pass-if "string-replace-substring" (string=? (string-replace-substring "a ring of strings" "ring" "rut") - "a rut of struts"))) + "a rut of struts")) + + (pass-if "string-split-substring - empty string" + (equal? (string-split-substring "" "foo") + '(""))) + + (pass-if "string-split-substring - non-empty, no delimiters" + (equal? (string-split-substring "testing" "foo") + '("testing"))) + + (pass-if "string-split-substring - non-empty, delimiters" + (equal? (string-split-substring "testingfoobar" "foo") + '("testing" "bar"))) + + (pass-if "string-split-substring - non-empty, leading delimiters" + (equal? (string-split-substring "foobar" "foo") + '("" "bar"))) + + (pass-if "string-split-substring - non-empty, trailing delimiters" + (equal? (string-split-substring "barfoo" "foo") + (list "bar" "")))) -- 2.35.1 --=-=-=--