From: Vijay Marupudi <vijay@vijaymarupudi.com>
To: Christopher Lam <christopher.lck@gmail.com>
Cc: guile-devel <guile-devel@gnu.org>
Subject: Re: [PATCH] Add string-split-substring
Date: Sat, 12 Feb 2022 23:03:18 -0500 [thread overview]
Message-ID: <87r187qupl.fsf@vijaymarupudi.com> (raw)
In-Reply-To: <CAKVAZZLF8KKhT4YyakN_FtthugPyeVxXZdppSdV4gvuQBPDDtQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]
Thanks for taking a look Christopher!
Christopher Lam <christopher.lck@gmail.com> writes:
> I think the last test should be:
>
> (pass-if "string-split-substring - non-empty, trailing delimiters"
> (equal? (string-split-substring "barfoo" "foo")
> (list "bar" ""))))
Good catch, thank you! I have fixed it in the updated patch attached to
this email.
> And isn't it more efficient to use substring/shared instead of
> substring?
It might? However, it seems like that would violate the functional
expection that modifying the strings returned from the function wouldn't
change the string that was being split.
Quoting the manual:
> Scheme Procedure: substring/shared str start [end]
> C Function: scm_substring_shared (str, start, end)
>
> Like substring, but the strings continue to share their storage
> even if they are modified. Thus, modifications to str show up in
> the new string, and vice versa.
That seems like surprising behavior to me (when you don't know that they
are shared). In case sharing is important, one can add a
`string-split-substring/shared' procedure?
~ Vijay
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-string-split-substring.patch --]
[-- Type: text/x-patch, Size: 3659 bytes --]
From 3c0940082ff49695ac9c2147c900b959be5f8e70 Mon Sep 17 00:00:00 2001
From: Vijay Marupudi <vijay@vijaymarupudi.com>
Date: Sat, 12 Feb 2022 22:00:57 -0500
Subject: [PATCH] Add string-split-substring
* /ref/api-data.texi: Added documentation
* module/ice-9/string-fun.scm: Added implementation
* test-suite/tests/strings.test: Added tests
---
doc/ref/api-data.texi | 10 ++++++++++
module/ice-9/string-fun.scm | 23 ++++++++++++++++++++++-
test-suite/tests/strings.test | 22 +++++++++++++++++++++-
3 files changed, 53 insertions(+), 2 deletions(-)
diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index 8658b9785..a5fcc47b1 100644
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -4245,6 +4245,16 @@ Return a new string where every instance of @var{substring} in string
@end lisp
@end deffn
+@deffn {Scheme Procedure} string-split-substring str substring
+Split the string @var{str} into a list of substrings delimited by the
+appearance of substring @var{substring}. For example:
+
+@lisp
+(string-replace-substring "item-1::item-2::item-3::item-4" "::")
+@result{} ("item-1" "item-2" "item-3" "item-4")
+@end lisp
+@end deffn
+
@node Representing Strings as Bytes
@subsubsection Representing Strings as Bytes
diff --git a/module/ice-9/string-fun.scm b/module/ice-9/string-fun.scm
index 03e0238fa..a1d4c0366 100644
--- a/module/ice-9/string-fun.scm
+++ b/module/ice-9/string-fun.scm
@@ -26,7 +26,7 @@
separate-fields-before-char string-prefix-predicate string-prefix=?
sans-surrounding-whitespace sans-trailing-whitespace
sans-leading-whitespace sans-final-newline has-trailing-newline?
- string-replace-substring))
+ string-replace-substring string-split-substring))
;;;;
;;;
@@ -313,3 +313,24 @@
(else
(display (substring/shared str start)))))))))
+(define (string-split-substring str substr)
+ "Split the string @var{str} into a list of substrings delimited by the
+substring @var{substr}."
+
+ (define substrlen (string-length substr))
+ (define strlen (string-length str))
+
+ (define (loop index start)
+ (cond
+ ((>= start strlen) (list ""))
+ ((not index) (list (substring str start)))
+ (else
+ (cons (substring str start index)
+ (let ((new-start (+ index substrlen)))
+ (loop (string-contains str substr new-start)
+ new-start))))))
+
+ (cond
+ ((string-contains str substr) => (lambda (idx) (loop idx 0)))
+ (else (list str))))
+
diff --git a/test-suite/tests/strings.test b/test-suite/tests/strings.test
index 7393bc8ec..8bc26e3e3 100644
--- a/test-suite/tests/strings.test
+++ b/test-suite/tests/strings.test
@@ -699,4 +699,24 @@
(pass-if "string-replace-substring"
(string=? (string-replace-substring "a ring of strings" "ring" "rut")
- "a rut of struts")))
+ "a rut of struts"))
+
+ (pass-if "string-split-substring - empty string"
+ (equal? (string-split-substring "" "foo")
+ '("")))
+
+ (pass-if "string-split-substring - non-empty, no delimiters"
+ (equal? (string-split-substring "testing" "foo")
+ '("testing")))
+
+ (pass-if "string-split-substring - non-empty, delimiters"
+ (equal? (string-split-substring "testingfoobar" "foo")
+ '("testing" "bar")))
+
+ (pass-if "string-split-substring - non-empty, leading delimiters"
+ (equal? (string-split-substring "foobar" "foo")
+ '("" "bar")))
+
+ (pass-if "string-split-substring - non-empty, trailing delimiters"
+ (equal? (string-split-substring "barfoo" "foo")
+ (list "bar" ""))))
--
2.35.1
prev parent reply other threads:[~2022-02-13 4:03 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-13 3:05 [PATCH] Add string-split-substring Vijay Marupudi
2022-02-13 3:28 ` Christopher Lam
2022-02-13 4:03 ` Vijay Marupudi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r187qupl.fsf@vijaymarupudi.com \
--to=vijay@vijaymarupudi.com \
--cc=christopher.lck@gmail.com \
--cc=guile-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).