unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] Add string-split-substring
@ 2022-02-13  3:05 Vijay Marupudi
  2022-02-13  3:28 ` Christopher Lam
  0 siblings, 1 reply; 3+ messages in thread
From: Vijay Marupudi @ 2022-02-13  3:05 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

Hello all,

I have added a function named `string-split-substring' to the (ice-9
string-fun) module. It acts like `string-split', but taking a substring
instead. It works like this

(string-replace-substring "item-1::item-2::item-3::item-4" "::")
=> ("item-1" "item-2" "item-3" "item-4")

The tests include all the edge cases in the tests for string-split, and
the behavior matches it exactly.

Documentation is also included in the patch.

I have found myself making and using this function numerous times, and
judging by IRC, others find it useful as well. The patch is attached.

~ Vijay


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-string-split-substring.patch --]
[-- Type: text/x-patch, Size: 3647 bytes --]

From 44ba1874d32e188fdd999e113781548ab2b128fa Mon Sep 17 00:00:00 2001
From: Vijay Marupudi <vijay@vijaymarupudi.com>
Date: Sat, 12 Feb 2022 22:00:57 -0500
Subject: [PATCH] Add string-split-substring

* /ref/api-data.texi: Added documentation
* module/ice-9/string-fun.scm: Added implementation
* test-suite/tests/strings.test: Added tests
---
 doc/ref/api-data.texi         | 10 ++++++++++
 module/ice-9/string-fun.scm   | 23 ++++++++++++++++++++++-
 test-suite/tests/strings.test | 22 +++++++++++++++++++++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index 8658b9785..a5fcc47b1 100644
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -4245,6 +4245,16 @@ Return a new string where every instance of @var{substring} in string
 @end lisp
 @end deffn
 
+@deffn {Scheme Procedure} string-split-substring str substring
+Split the string @var{str} into a list of substrings delimited by the
+appearance of substring @var{substring}. For example:
+
+@lisp
+(string-replace-substring "item-1::item-2::item-3::item-4" "::")
+@result{} ("item-1" "item-2" "item-3" "item-4")
+@end lisp
+@end deffn
+
 @node Representing Strings as Bytes
 @subsubsection Representing Strings as Bytes
 
diff --git a/module/ice-9/string-fun.scm b/module/ice-9/string-fun.scm
index 03e0238fa..a1d4c0366 100644
--- a/module/ice-9/string-fun.scm
+++ b/module/ice-9/string-fun.scm
@@ -26,7 +26,7 @@
 	   separate-fields-before-char string-prefix-predicate string-prefix=?
 	   sans-surrounding-whitespace sans-trailing-whitespace
 	   sans-leading-whitespace sans-final-newline has-trailing-newline?
-           string-replace-substring))
+           string-replace-substring string-split-substring))
 
 ;;;;
 ;;;
@@ -313,3 +313,24 @@
            (else
             (display (substring/shared str start)))))))))
 
+(define (string-split-substring str substr)
+  "Split the string @var{str} into a list of substrings delimited by the
+substring @var{substr}."
+
+  (define substrlen (string-length substr))
+  (define strlen (string-length str))
+
+  (define (loop index start)
+    (cond
+     ((>= start strlen) (list ""))
+     ((not index) (list (substring str start)))
+     (else
+      (cons (substring str start index)
+            (let ((new-start (+ index substrlen)))
+              (loop (string-contains str substr new-start)
+                    new-start))))))
+
+  (cond
+   ((string-contains str substr) => (lambda (idx) (loop idx 0)))
+   (else (list str))))
+
diff --git a/test-suite/tests/strings.test b/test-suite/tests/strings.test
index 7393bc8ec..0c09e97c8 100644
--- a/test-suite/tests/strings.test
+++ b/test-suite/tests/strings.test
@@ -699,4 +699,24 @@
 
   (pass-if "string-replace-substring"
     (string=? (string-replace-substring "a ring of strings" "ring" "rut")
-              "a rut of struts")))
+              "a rut of struts"))
+
+  (pass-if "string-split-substring - empty string"
+    (equal? (string-split-substring "" "foo")
+            '("")))
+
+  (pass-if "string-split-substring - non-empty, no delimiters"
+    (equal? (string-split-substring "testing" "foo")
+            '("testing")))
+
+  (pass-if "string-split-substring - non-empty, delimiters"
+    (equal? (string-split-substring "testingfoobar" "foo")
+            '("testing" "bar")))
+
+  (pass-if "string-split-substring - non-empty, leading delimiters"
+    (equal? (string-split-substring "foobar" "foo")
+            '("" "bar")))
+
+  (pass-if "string-split-substring - non-empty, trailing delimiters"
+    (equal? (string-split-substring "" "foo")
+            (list ""))))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] Add string-split-substring
  2022-02-13  3:05 [PATCH] Add string-split-substring Vijay Marupudi
@ 2022-02-13  3:28 ` Christopher Lam
  2022-02-13  4:03   ` Vijay Marupudi
  0 siblings, 1 reply; 3+ messages in thread
From: Christopher Lam @ 2022-02-13  3:28 UTC (permalink / raw)
  To: Vijay Marupudi; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

I think the last test should be:

  (pass-if "string-split-substring - non-empty, trailing delimiters"
    (equal? (string-split-substring "barfoo" "foo")
            (list "bar" ""))))

And isn't it more efficient to use substring/shared instead of substring?

On Sun, 13 Feb 2022 at 03:06, Vijay Marupudi <vijay@vijaymarupudi.com>
wrote:

> Hello all,
>
> I have added a function named `string-split-substring' to the (ice-9
> string-fun) module. It acts like `string-split', but taking a substring
> instead. It works like this
>
> (string-replace-substring "item-1::item-2::item-3::item-4" "::")
> => ("item-1" "item-2" "item-3" "item-4")
>
> The tests include all the edge cases in the tests for string-split, and
> the behavior matches it exactly.
>
> Documentation is also included in the patch.
>
> I have found myself making and using this function numerous times, and
> judging by IRC, others find it useful as well. The patch is attached.
>
> ~ Vijay
>
>

[-- Attachment #2: Type: text/html, Size: 1516 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Add string-split-substring
  2022-02-13  3:28 ` Christopher Lam
@ 2022-02-13  4:03   ` Vijay Marupudi
  0 siblings, 0 replies; 3+ messages in thread
From: Vijay Marupudi @ 2022-02-13  4:03 UTC (permalink / raw)
  To: Christopher Lam; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]

Thanks for taking a look Christopher!

Christopher Lam <christopher.lck@gmail.com> writes:

> I think the last test should be:
>
>   (pass-if "string-split-substring - non-empty, trailing delimiters"
>     (equal? (string-split-substring "barfoo" "foo")
>             (list "bar" ""))))

Good catch, thank you! I have fixed it in the updated patch attached to
this email.

> And isn't it more efficient to use substring/shared instead of
> substring?

It might? However, it seems like that would violate the functional
expection that modifying the strings returned from the function wouldn't
change the string that was being split.

Quoting the manual:

> Scheme Procedure: substring/shared str start [end]
> C Function: scm_substring_shared (str, start, end)
>
>     Like substring, but the strings continue to share their storage
>     even if they are modified. Thus, modifications to str show up in
>     the new string, and vice versa.

That seems like surprising behavior to me (when you don't know that they
are shared). In case sharing is important, one can add a
`string-split-substring/shared' procedure?

~ Vijay


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-string-split-substring.patch --]
[-- Type: text/x-patch, Size: 3659 bytes --]

From 3c0940082ff49695ac9c2147c900b959be5f8e70 Mon Sep 17 00:00:00 2001
From: Vijay Marupudi <vijay@vijaymarupudi.com>
Date: Sat, 12 Feb 2022 22:00:57 -0500
Subject: [PATCH] Add string-split-substring

* /ref/api-data.texi: Added documentation
* module/ice-9/string-fun.scm: Added implementation
* test-suite/tests/strings.test: Added tests
---
 doc/ref/api-data.texi         | 10 ++++++++++
 module/ice-9/string-fun.scm   | 23 ++++++++++++++++++++++-
 test-suite/tests/strings.test | 22 +++++++++++++++++++++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index 8658b9785..a5fcc47b1 100644
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -4245,6 +4245,16 @@ Return a new string where every instance of @var{substring} in string
 @end lisp
 @end deffn
 
+@deffn {Scheme Procedure} string-split-substring str substring
+Split the string @var{str} into a list of substrings delimited by the
+appearance of substring @var{substring}. For example:
+
+@lisp
+(string-replace-substring "item-1::item-2::item-3::item-4" "::")
+@result{} ("item-1" "item-2" "item-3" "item-4")
+@end lisp
+@end deffn
+
 @node Representing Strings as Bytes
 @subsubsection Representing Strings as Bytes
 
diff --git a/module/ice-9/string-fun.scm b/module/ice-9/string-fun.scm
index 03e0238fa..a1d4c0366 100644
--- a/module/ice-9/string-fun.scm
+++ b/module/ice-9/string-fun.scm
@@ -26,7 +26,7 @@
 	   separate-fields-before-char string-prefix-predicate string-prefix=?
 	   sans-surrounding-whitespace sans-trailing-whitespace
 	   sans-leading-whitespace sans-final-newline has-trailing-newline?
-           string-replace-substring))
+           string-replace-substring string-split-substring))
 
 ;;;;
 ;;;
@@ -313,3 +313,24 @@
            (else
             (display (substring/shared str start)))))))))
 
+(define (string-split-substring str substr)
+  "Split the string @var{str} into a list of substrings delimited by the
+substring @var{substr}."
+
+  (define substrlen (string-length substr))
+  (define strlen (string-length str))
+
+  (define (loop index start)
+    (cond
+     ((>= start strlen) (list ""))
+     ((not index) (list (substring str start)))
+     (else
+      (cons (substring str start index)
+            (let ((new-start (+ index substrlen)))
+              (loop (string-contains str substr new-start)
+                    new-start))))))
+
+  (cond
+   ((string-contains str substr) => (lambda (idx) (loop idx 0)))
+   (else (list str))))
+
diff --git a/test-suite/tests/strings.test b/test-suite/tests/strings.test
index 7393bc8ec..8bc26e3e3 100644
--- a/test-suite/tests/strings.test
+++ b/test-suite/tests/strings.test
@@ -699,4 +699,24 @@
 
   (pass-if "string-replace-substring"
     (string=? (string-replace-substring "a ring of strings" "ring" "rut")
-              "a rut of struts")))
+              "a rut of struts"))
+
+  (pass-if "string-split-substring - empty string"
+    (equal? (string-split-substring "" "foo")
+            '("")))
+
+  (pass-if "string-split-substring - non-empty, no delimiters"
+    (equal? (string-split-substring "testing" "foo")
+            '("testing")))
+
+  (pass-if "string-split-substring - non-empty, delimiters"
+    (equal? (string-split-substring "testingfoobar" "foo")
+            '("testing" "bar")))
+
+  (pass-if "string-split-substring - non-empty, leading delimiters"
+    (equal? (string-split-substring "foobar" "foo")
+            '("" "bar")))
+
+  (pass-if "string-split-substring - non-empty, trailing delimiters"
+    (equal? (string-split-substring "barfoo" "foo")
+            (list "bar" ""))))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-02-13  4:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-13  3:05 [PATCH] Add string-split-substring Vijay Marupudi
2022-02-13  3:28 ` Christopher Lam
2022-02-13  4:03   ` Vijay Marupudi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).