unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] add regexp-split
@ 2011-12-29  9:32 Nala Ginrut
  2011-12-29  9:46 ` Nala Ginrut
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Nala Ginrut @ 2011-12-29  9:32 UTC (permalink / raw)
  To: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 517 bytes --]

hi guilers!
It seems like there's no "regexp-split" procedure in Guile.
What we have is "string-split" which accepted Char only.
So I wrote one for myself.

------python code-----
>>> import re
>>> re.split("([^0-9])", "123+456*/")
[’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
--------code end-------

The Guile version:

----------guile code-------
(regexp-split "([^0-9])"  "123+456*/")
==>("123" "+" "456" "*" "" "/" "")
----------code end--------

Anyone interested in it?

[-- Attachment #1.2: Type: text/html, Size: 884 bytes --]

[-- Attachment #2: 0001-ADD-regexp-split.patch --]
[-- Type: text/x-patch, Size: 1571 bytes --]

From eb0bb80c86c9539712b78cf8902d230e0c4e778e Mon Sep 17 00:00:00 2001
From: NalaGinrut <NalaGinrut@gmail.com>
Date: Thu, 29 Dec 2011 17:25:03 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   23 ++++++++++++++++++++++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..5a90c67 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -41,7 +41,7 @@
   #:export (match:count match:string match:prefix match:suffix
            regexp-match? regexp-quote match:start match:end match:substring
            string-match regexp-substitute fold-matches list-matches
-           regexp-substitute/global))
+           regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +226,24 @@
                         (begin
                           (do-item (car items)) ; This is not.
                           (next-item (cdr items)))))))))))
+                          
+(define regexp-split
+  (lambda (regex str)
+    (let* ([len (string-length str)]
+	   [ret (fold-matches 
+		 regex str (list '() 0 0 '(""))
+		 (lambda (m prev)
+		   (let* ([ll (car prev)]
+			  [count (1+ (cadr prev))]
+			  [start (caddr prev)]
+			  [tail (match:suffix m)]
+			  [end (match:start m)]
+			  [s (string-copy str start end)]
+			  )
+		     (list `(,@ll ,s ,(match:substring m)) 
+			   count (match:end m) tail)
+		     )))] ;; end fold-matches
+	   ) ;; end let*
+      `(,@(car ret) ,(cadddr ret))
+      )))
+                                
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-29  9:32 [PATCH] add regexp-split Nala Ginrut
@ 2011-12-29  9:46 ` Nala Ginrut
  2011-12-29 10:20   ` Nala Ginrut
  2011-12-30 10:14 ` Marijn
  2011-12-30 13:03 ` Neil Jerram
  2 siblings, 1 reply; 31+ messages in thread
From: Nala Ginrut @ 2011-12-29  9:46 UTC (permalink / raw)
  To: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 674 bytes --]

Sorry, there's a typo.
Here it is.

On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:

> hi guilers!
> It seems like there's no "regexp-split" procedure in Guile.
> What we have is "string-split" which accepted Char only.
> So I wrote one for myself.
>
> ------python code-----
> >>> import re
> >>> re.split("([^0-9])", "123+456*/")
> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
> --------code end-------
>
> The Guile version:
>
> ----------guile code-------
> (regexp-split "([^0-9])"  "123+456*/")
> ==>("123" "+" "456" "*" "" "/" "")
> ----------code end--------
>
> Anyone interested in it?
>
>

[-- Attachment #1.2: Type: text/html, Size: 1249 bytes --]

[-- Attachment #2: 0001-ADD-regexp-split.patch --]
[-- Type: text/x-patch, Size: 1539 bytes --]

From 7ecd9cfbb97b436ed9417f4962bc04264bcdb5e4 Mon Sep 17 00:00:00 2001
From: NalaGinrut <NalaGinrut@gmail.com>
Date: Thu, 29 Dec 2011 17:45:43 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   22 +++++++++++++++++++++-
 1 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..2ab28de 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -41,7 +41,7 @@
   #:export (match:count match:string match:prefix match:suffix
            regexp-match? regexp-quote match:start match:end match:substring
            string-match regexp-substitute fold-matches list-matches
-           regexp-substitute/global))
+           regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +226,23 @@
                         (begin
                           (do-item (car items)) ; This is not.
                           (next-item (cdr items)))))))))))
+                          
+(define regexp-split
+  (lambda (regex str)
+    (let* ([ret (fold-matches 
+		 regex str (list '() 0 0 '(""))
+		 (lambda (m prev)
+		   (let* ([ll (car prev)]
+			  [count (1+ (cadr prev))]
+			  [start (caddr prev)]
+			  [tail (match:suffix m)]
+			  [end (match:start m)]
+			  [s (string-copy str start end)]
+			  )
+		     (list `(,@ll ,s ,(match:substring m)) 
+			   count (match:end m) tail)
+		     )))] ;; end fold-matches
+	   ) ;; end let*
+      `(,@(car ret) ,(cadddr ret))
+      )))
+                                
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-29  9:46 ` Nala Ginrut
@ 2011-12-29 10:20   ` Nala Ginrut
  2011-12-29 13:58     ` Nala Ginrut
  0 siblings, 1 reply; 31+ messages in thread
From: Nala Ginrut @ 2011-12-29 10:20 UTC (permalink / raw)
  To: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 842 bytes --]

Hmm... think I've deleted all the things for debug.

On Thu, Dec 29, 2011 at 5:46 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:

> Sorry, there's a typo.
> Here it is.
>
>
> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:
>
>> hi guilers!
>> It seems like there's no "regexp-split" procedure in Guile.
>> What we have is "string-split" which accepted Char only.
>> So I wrote one for myself.
>>
>> ------python code-----
>> >>> import re
>> >>> re.split("([^0-9])", "123+456*/")
>> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
>> --------code end-------
>>
>> The Guile version:
>>
>> ----------guile code-------
>> (regexp-split "([^0-9])"  "123+456*/")
>> ==>("123" "+" "456" "*" "" "/" "")
>> ----------code end--------
>>
>> Anyone interested in it?
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 1679 bytes --]

[-- Attachment #2: 0001-ADD-regexp-split.patch --]
[-- Type: text/x-patch, Size: 1440 bytes --]

From 39155a1ddebd4da0cd13a4bcae93395f39765c0e Mon Sep 17 00:00:00 2001
From: NalaGinrut <NalaGinrut@gmail.com>
Date: Thu, 29 Dec 2011 18:19:00 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   20 +++++++++++++++++++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..2877419 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -41,7 +41,7 @@
   #:export (match:count match:string match:prefix match:suffix
            regexp-match? regexp-quote match:start match:end match:substring
            string-match regexp-substitute fold-matches list-matches
-           regexp-substitute/global))
+           regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +226,21 @@
                         (begin
                           (do-item (car items)) ; This is not.
                           (next-item (cdr items)))))))))))
+                          
+(define regexp-split
+  (lambda (regex str)
+    (let ([ret (fold-matches 
+		regex str (list '() 0 '(""))
+		(lambda (m prev)
+		  (let* ([ll (car prev)]
+			 [start (cadr prev)]
+			 [tail (match:suffix m)]
+			 [end (match:start m)]
+			 [s (string-copy str start end)]
+			 )
+		    (list `(,@ll ,s ,(match:substring m)) 
+			  (match:end m) tail)
+		    )))])
+      `(,@(car ret) ,(caddr ret))
+      )))
+                          
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-29 10:20   ` Nala Ginrut
@ 2011-12-29 13:58     ` Nala Ginrut
  2011-12-30  5:34       ` Daniel Hartwig
  0 siblings, 1 reply; 31+ messages in thread
From: Nala Ginrut @ 2011-12-29 13:58 UTC (permalink / raw)
  To: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 1280 bytes --]

Well, that's my fourth commit for this patch.
The reason is cky emphasized that the "real lisper" never use square
bracket. Anyway, I confess
square bracket is just my own style. So I think must make this patch
according to all Guilers' taste  if I want this patch to be helpful.
Four times for one patch. oh GOD...

On Thu, Dec 29, 2011 at 6:20 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:

> Hmm... think I've deleted all the things for debug.
>
> On Thu, Dec 29, 2011 at 5:46 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:
>
>> Sorry, there's a typo.
>> Here it is.
>>
>>
>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalaginrut@gmail.com>wrote:
>>
>>> hi guilers!
>>> It seems like there's no "regexp-split" procedure in Guile.
>>> What we have is "string-split" which accepted Char only.
>>> So I wrote one for myself.
>>>
>>> ------python code-----
>>> >>> import re
>>> >>> re.split("([^0-9])", "123+456*/")
>>> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
>>> --------code end-------
>>>
>>> The Guile version:
>>>
>>> ----------guile code-------
>>> (regexp-split "([^0-9])"  "123+456*/")
>>> ==>("123" "+" "456" "*" "" "/" "")
>>> ----------code end--------
>>>
>>> Anyone interested in it?
>>>
>>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 2421 bytes --]

[-- Attachment #2: 0001-ADD-regexp-split.patch --]
[-- Type: text/x-patch, Size: 1440 bytes --]

From 39155a1ddebd4da0cd13a4bcae93395f39765c0e Mon Sep 17 00:00:00 2001
From: NalaGinrut <NalaGinrut@gmail.com>
Date: Thu, 29 Dec 2011 18:19:00 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   20 +++++++++++++++++++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..2877419 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -41,7 +41,7 @@
   #:export (match:count match:string match:prefix match:suffix
            regexp-match? regexp-quote match:start match:end match:substring
            string-match regexp-substitute fold-matches list-matches
-           regexp-substitute/global))
+           regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +226,21 @@
                         (begin
                           (do-item (car items)) ; This is not.
                           (next-item (cdr items)))))))))))
+                          
+(define regexp-split
+  (lambda (regex str)
+    (let ((ret (fold-matches 
+		regex str (list '() 0 '(""))
+		(lambda (m prev)
+		  (let* ((ll (car prev))
+			 (start (cadr prev))
+			 (tail (match:suffix m))
+			 (end (match:start m))
+			 (s (string-copy str start end))
+			 )
+		    (list `(,@ll ,s ,(match:substring m)) 
+			  (match:end m) tail)
+		    )))))
+      `(,@(car ret) ,(caddr ret))
+      )))
+                          
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-29 13:58     ` Nala Ginrut
@ 2011-12-30  5:34       ` Daniel Hartwig
  2011-12-30  8:46         ` Nala Ginrut
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Hartwig @ 2011-12-30  5:34 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

Hello

>>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalaginrut@gmail.com>
>>> wrote:
>>>>
>>>> hi guilers!
>>>> It seems like there's no "regexp-split" procedure in Guile.
>>>> What we have is "string-split" which accepted Char only.
>>>> So I wrote one for myself.
>>>>
>>>> ------python code-----
>>>> >>> import re
>>>> >>> re.split("([^0-9])", "123+456*/")
>>>> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
>>>> --------code end-------
>>>>
>>>> The Guile version:
>>>>
>>>> ----------guile code-------
>>>> (regexp-split "([^0-9])"  "123+456*/")
>>>> ==>("123" "+" "456" "*" "" "/" "")
>>>> ----------code end--------
>>>>
>>>> Anyone interested in it?
>>>>

Nice work!  I have a couple of comments :-)


The matched pattern/deliminator is included in the output:

scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.")
$21 = ("Words" ", " "words" ", " "words" "." "")
scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.")
$22 = ("Words" ", " "words" ", " "words" "." "")

However, a user is not always interested in the deliminator.  Consider
the example given for string-split:

scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:)
$23 = ("root" "x" "0" "0" "root" "/root" "/bin/bash")

This behaviour can be obtained with list-matches on the complement of
REGEXP.

scheme@(guile-user)> (map match:substring
                          (list-matches "\\w+" "Words, words, words."))
$24 = ("Words" "words" "words")

I would like to see your version support the Python semantics [1]:

> If capturing parentheses are used in pattern, then the text of
> all groups in the pattern are also returned as part of the resulting
> list.
[...]
> >>> re.split('\W+', 'Words, words, words.')
> ['Words', 'words', 'words', '']
> >>> re.split('(\W+)', 'Words, words, words.')
> ['Words', ', ', 'words', ', ', 'words', '.', '']

>>> re.split('((,)?\W+?)', 'Words, words, words.')
['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']


For the sake of consistency with the rest of the module perhaps
support the `flags' option (just pass it to fold-matches) and use the
same variable names, etc.:

(define* (regexp-split regexp string #:optional (flags 0))
  ...

instead of:

(define regexp-split
  (lambda (regex str)
  ...


Also, to me the name seems unintuitive -- it is STR being split, not
RE -- perhaps this can be folded in to the existing string-split
function.


A nice patch none-the-less!


[1] http://docs.python.org/library/re.html#re.split



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30  5:34       ` Daniel Hartwig
@ 2011-12-30  8:46         ` Nala Ginrut
  2011-12-30  9:05           ` Nala Ginrut
       [not found]           ` <CAN3veRdFQyOthFTSLE7v9x3_A4HTPX99DSmDx26dBkeyy=MTDQ@mail.gmail.com>
  0 siblings, 2 replies; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30  8:46 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3265 bytes --]

hi Daniel! Very glad to see your reply.
1. I also think the order: (regexp str) is strange. But it's according to
python version.
And I think the 'string-match' also put regexp before str. Anyway, that's
an easy mend.
2. I think it's a little different to implement a flag as python version.
Since "ignorecase" flag must
be passed to make-regexp. So we can't use fold-matches.
Hmm...let me see what I can do...

On Fri, Dec 30, 2011 at 1:34 PM, Daniel Hartwig <mandyke@gmail.com> wrote:

> Hello
>
> >>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalaginrut@gmail.com>
> >>> wrote:
> >>>>
> >>>> hi guilers!
> >>>> It seems like there's no "regexp-split" procedure in Guile.
> >>>> What we have is "string-split" which accepted Char only.
> >>>> So I wrote one for myself.
> >>>>
> >>>> ------python code-----
> >>>> >>> import re
> >>>> >>> re.split("([^0-9])", "123+456*/")
> >>>> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
> >>>> --------code end-------
> >>>>
> >>>> The Guile version:
> >>>>
> >>>> ----------guile code-------
> >>>> (regexp-split "([^0-9])"  "123+456*/")
> >>>> ==>("123" "+" "456" "*" "" "/" "")
> >>>> ----------code end--------
> >>>>
> >>>> Anyone interested in it?
> >>>>
>
> Nice work!  I have a couple of comments :-)
>
>
> The matched pattern/deliminator is included in the output:
>
> scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.")
> $21 = ("Words" ", " "words" ", " "words" "." "")
> scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.")
> $22 = ("Words" ", " "words" ", " "words" "." "")
>
> However, a user is not always interested in the deliminator.  Consider
> the example given for string-split:
>
> scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:)
> $23 = ("root" "x" "0" "0" "root" "/root" "/bin/bash")
>
> This behaviour can be obtained with list-matches on the complement of
> REGEXP.
>
> scheme@(guile-user)> (map match:substring
>                          (list-matches "\\w+" "Words, words, words."))
> $24 = ("Words" "words" "words")
>
> I would like to see your version support the Python semantics [1]:
>
> > If capturing parentheses are used in pattern, then the text of
> > all groups in the pattern are also returned as part of the resulting
> > list.
> [...]
> > >>> re.split('\W+', 'Words, words, words.')
> > ['Words', 'words', 'words', '']
> > >>> re.split('(\W+)', 'Words, words, words.')
> > ['Words', ', ', 'words', ', ', 'words', '.', '']
>
> >>> re.split('((,)?\W+?)', 'Words, words, words.')
> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
>
>
> For the sake of consistency with the rest of the module perhaps
> support the `flags' option (just pass it to fold-matches) and use the
> same variable names, etc.:
>
> (define* (regexp-split regexp string #:optional (flags 0))
>  ...
>
> instead of:
>
> (define regexp-split
>  (lambda (regex str)
>  ...
>
>
> Also, to me the name seems unintuitive -- it is STR being split, not
> RE -- perhaps this can be folded in to the existing string-split
> function.
>
>
> A nice patch none-the-less!
>
>
> [1] http://docs.python.org/library/re.html#re.split
>

[-- Attachment #2: Type: text/html, Size: 4956 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30  8:46         ` Nala Ginrut
@ 2011-12-30  9:05           ` Nala Ginrut
       [not found]           ` <CAN3veRdFQyOthFTSLE7v9x3_A4HTPX99DSmDx26dBkeyy=MTDQ@mail.gmail.com>
  1 sibling, 0 replies; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30  9:05 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3527 bytes --]

Well, I realized it's a mistake. We can use fold-matches anyway.

On Fri, Dec 30, 2011 at 4:46 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:

> hi Daniel! Very glad to see your reply.
> 1. I also think the order: (regexp str) is strange. But it's according to
> python version.
> And I think the 'string-match' also put regexp before str. Anyway, that's
> an easy mend.
> 2. I think it's a little different to implement a flag as python version.
> Since "ignorecase" flag must
> be passed to make-regexp. So we can't use fold-matches.
> Hmm...let me see what I can do...
>
> On Fri, Dec 30, 2011 at 1:34 PM, Daniel Hartwig <mandyke@gmail.com> wrote:
>
>> Hello
>>
>> >>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalaginrut@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> hi guilers!
>> >>>> It seems like there's no "regexp-split" procedure in Guile.
>> >>>> What we have is "string-split" which accepted Char only.
>> >>>> So I wrote one for myself.
>> >>>>
>> >>>> ------python code-----
>> >>>> >>> import re
>> >>>> >>> re.split("([^0-9])", "123+456*/")
>> >>>> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
>> >>>> --------code end-------
>> >>>>
>> >>>> The Guile version:
>> >>>>
>> >>>> ----------guile code-------
>> >>>> (regexp-split "([^0-9])"  "123+456*/")
>> >>>> ==>("123" "+" "456" "*" "" "/" "")
>> >>>> ----------code end--------
>> >>>>
>> >>>> Anyone interested in it?
>> >>>>
>>
>> Nice work!  I have a couple of comments :-)
>>
>>
>> The matched pattern/deliminator is included in the output:
>>
>> scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.")
>> $21 = ("Words" ", " "words" ", " "words" "." "")
>> scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.")
>> $22 = ("Words" ", " "words" ", " "words" "." "")
>>
>> However, a user is not always interested in the deliminator.  Consider
>> the example given for string-split:
>>
>> scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:)
>> $23 = ("root" "x" "0" "0" "root" "/root" "/bin/bash")
>>
>> This behaviour can be obtained with list-matches on the complement of
>> REGEXP.
>>
>> scheme@(guile-user)> (map match:substring
>>                          (list-matches "\\w+" "Words, words, words."))
>> $24 = ("Words" "words" "words")
>>
>> I would like to see your version support the Python semantics [1]:
>>
>> > If capturing parentheses are used in pattern, then the text of
>> > all groups in the pattern are also returned as part of the resulting
>> > list.
>> [...]
>> > >>> re.split('\W+', 'Words, words, words.')
>> > ['Words', 'words', 'words', '']
>> > >>> re.split('(\W+)', 'Words, words, words.')
>> > ['Words', ', ', 'words', ', ', 'words', '.', '']
>>
>> >>> re.split('((,)?\W+?)', 'Words, words, words.')
>> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
>>
>>
>> For the sake of consistency with the rest of the module perhaps
>> support the `flags' option (just pass it to fold-matches) and use the
>> same variable names, etc.:
>>
>> (define* (regexp-split regexp string #:optional (flags 0))
>>  ...
>>
>> instead of:
>>
>> (define regexp-split
>>  (lambda (regex str)
>>  ...
>>
>>
>> Also, to me the name seems unintuitive -- it is STR being split, not
>> RE -- perhaps this can be folded in to the existing string-split
>> function.
>>
>>
>> A nice patch none-the-less!
>>
>>
>> [1] http://docs.python.org/library/re.html#re.split
>>
>
>

[-- Attachment #2: Type: text/html, Size: 5410 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
       [not found]           ` <CAN3veRdFQyOthFTSLE7v9x3_A4HTPX99DSmDx26dBkeyy=MTDQ@mail.gmail.com>
@ 2011-12-30  9:42             ` Daniel Hartwig
  2011-12-30 11:40               ` Nala Ginrut
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Hartwig @ 2011-12-30  9:42 UTC (permalink / raw)
  To: guile-devel

On 30 December 2011 16:46, Nala Ginrut <nalaginrut@gmail.com> wrote:
> hi Daniel! Very glad to see your reply.
> 1. I also think the order: (regexp str) is strange. But it's according to
> python version.
> And I think the 'string-match' also put regexp before str. Anyway, that's an
> easy mend.

`regexp string' is also the same order as `list-matches' and
`fold-matches'.  Probably best to keep it that way if this is in the
regex module.


>> I would like to see your version support the Python semantics [1]:
>>
>> > If capturing parentheses are used in pattern, then the text of
>> > all groups in the pattern are also returned as part of the resulting
>> > list.
>> [...]
>> > >>> re.split('\W+', 'Words, words, words.')
>> > ['Words', 'words', 'words', '']
>> > >>> re.split('(\W+)', 'Words, words, words.')
>> > ['Words', ', ', 'words', ', ', 'words', '.', '']
>>
>> >>> re.split('((,)?\W+?)', 'Words, words, words.')
>> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']

FYI this can be achieved by changing the inner part to:

   (let* ...
          (s (substring string start end))
          (groups (map (lambda (n) (match:substring m n))
                       (iota (1- (match:count m)) 1))))
     (list `(,@ll ,s ,@groups) (match:end m) tail)))

Note: using srfi-1 iota



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-29  9:32 [PATCH] add regexp-split Nala Ginrut
  2011-12-29  9:46 ` Nala Ginrut
@ 2011-12-30 10:14 ` Marijn
  2011-12-30 10:56   ` Nala Ginrut
  2011-12-30 13:03 ` Neil Jerram
  2 siblings, 1 reply; 31+ messages in thread
From: Marijn @ 2011-12-30 10:14 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 29-12-11 10:32, Nala Ginrut wrote:
> hi guilers! It seems like there's no "regexp-split" procedure in
> Guile. What we have is "string-split" which accepted Char only. So
> I wrote one for myself.
> 
> ------python code-----
>>>> import re re.split("([^0-9])", "123+456*/")
> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’] --------code end-------
> 
> The Guile version:
> 
> ----------guile code------- (regexp-split "([^0-9])"  "123+456*/") 
> ==>("123" "+" "456" "*" "" "/" "") ----------code end--------
> 
> Anyone interested in it?

Hi there,

I think we're all happy that Guile is getting this support, however I
couldn't help but notice that the above results look a bit funny and
indeed are incompatible with racket's implementation:

> (regexp-split "([^0-9])" "123+456*/")
'("123" "456" "" "")

Apparently because their version doesn't support capturing groups in
this function. I've raised the issue with them as well, but there are
some doubts that it is useful/sane to support this. Perhaps other
schemes' regexp libraries should be compared as well. Their tests
would certainly be useful and may point out other incompatibilities
that no-one is aware of (as well as improve your code(!)).

Marijn
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk79jwgACgkQp/VmCx0OL2zCrgCgrCtBGvKaejnfceWj8RaBz+lm
lfMAoIrR0qr8IFKhFG4KGBevf1LQfoZv
=2x7Y
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 10:14 ` Marijn
@ 2011-12-30 10:56   ` Nala Ginrut
  2011-12-30 11:48     ` Marijn
  0 siblings, 1 reply; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30 10:56 UTC (permalink / raw)
  To: Marijn; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

Hmm, interesting!
I must confess I'm not familiar with Racket, but I think the aim of Guile
contains practicality.
So I think regex-lib of Guile does this at least.
Anyway, I believe an implementation should do its best to provide any
useful mechanism for the user. Or it won't be popular anymore.
When I talk about "useful", I mean "it brings the user convenient", not
"the developer think it's useful".
Just my mumble, no any offense.
Thank you for telling us this issue. ;-)

On Fri, Dec 30, 2011 at 6:14 PM, Marijn <hkBst@gentoo.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 29-12-11 10:32, Nala Ginrut wrote:
> > hi guilers! It seems like there's no "regexp-split" procedure in
> > Guile. What we have is "string-split" which accepted Char only. So
> > I wrote one for myself.
> >
> > ------python code-----
> >>>> import re re.split("([^0-9])", "123+456*/")
> > [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’] --------code end-------
> >
> > The Guile version:
> >
> > ----------guile code------- (regexp-split "([^0-9])"  "123+456*/")
> > ==>("123" "+" "456" "*" "" "/" "") ----------code end--------
> >
> > Anyone interested in it?
>
> Hi there,
>
> I think we're all happy that Guile is getting this support, however I
> couldn't help but notice that the above results look a bit funny and
> indeed are incompatible with racket's implementation:
>
> > (regexp-split "([^0-9])" "123+456*/")
> '("123" "456" "" "")
>
> Apparently because their version doesn't support capturing groups in
> this function. I've raised the issue with them as well, but there are
> some doubts that it is useful/sane to support this. Perhaps other
> schemes' regexp libraries should be compared as well. Their tests
> would certainly be useful and may point out other incompatibilities
> that no-one is aware of (as well as improve your code(!)).
>
> Marijn
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.18 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk79jwgACgkQp/VmCx0OL2zCrgCgrCtBGvKaejnfceWj8RaBz+lm
> lfMAoIrR0qr8IFKhFG4KGBevf1LQfoZv
> =2x7Y
> -----END PGP SIGNATURE-----
>

[-- Attachment #2: Type: text/html, Size: 3026 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30  9:42             ` Daniel Hartwig
@ 2011-12-30 11:40               ` Nala Ginrut
  2011-12-30 11:47                 ` Nala Ginrut
  0 siblings, 1 reply; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30 11:40 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 1500 bytes --]

Great! It's better now.
Here's the brand new patch~

On Fri, Dec 30, 2011 at 5:42 PM, Daniel Hartwig <mandyke@gmail.com> wrote:

> On 30 December 2011 16:46, Nala Ginrut <nalaginrut@gmail.com> wrote:
> > hi Daniel! Very glad to see your reply.
> > 1. I also think the order: (regexp str) is strange. But it's according to
> > python version.
> > And I think the 'string-match' also put regexp before str. Anyway,
> that's an
> > easy mend.
>
> `regexp string' is also the same order as `list-matches' and
> `fold-matches'.  Probably best to keep it that way if this is in the
> regex module.
>
>
> >> I would like to see your version support the Python semantics [1]:
> >>
> >> > If capturing parentheses are used in pattern, then the text of
> >> > all groups in the pattern are also returned as part of the resulting
> >> > list.
> >> [...]
> >> > >>> re.split('\W+', 'Words, words, words.')
> >> > ['Words', 'words', 'words', '']
> >> > >>> re.split('(\W+)', 'Words, words, words.')
> >> > ['Words', ', ', 'words', ', ', 'words', '.', '']
> >>
> >> >>> re.split('((,)?\W+?)', 'Words, words, words.')
> >> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
>
> FYI this can be achieved by changing the inner part to:
>
>    (let* ...
>           (s (substring string start end))
>           (groups (map (lambda (n) (match:substring m n))
>                        (iota (1- (match:count m)) 1))))
>      (list `(,@ll ,s ,@groups) (match:end m) tail)))
>
> Note: using srfi-1 iota
>
>

[-- Attachment #1.2: Type: text/html, Size: 2362 bytes --]

[-- Attachment #2: 0001-ADD-regexp-split.patch --]
[-- Type: text/x-patch, Size: 1519 bytes --]

From b738a8b890f41bf684c0556ca79af2d7c14b6df5 Mon Sep 17 00:00:00 2001
From: NalaGinrut <NalaGinrut@gmail.com>
Date: Fri, 30 Dec 2011 19:38:38 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..b5f6149 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -41,7 +41,7 @@
   #:export (match:count match:string match:prefix match:suffix
            regexp-match? regexp-quote match:start match:end match:substring
            string-match regexp-substitute fold-matches list-matches
-           regexp-substitute/global))
+           regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +226,19 @@
                         (begin
                           (do-item (car items)) ; This is not.
                           (next-item (cdr items)))))))))))
+                          
+(define* (regexp-split regex str #:optional (flags 0))
+  (let ((ret (fold-matches 
+	      regex str (list '() 0 '(""))
+	      (lambda (m prev)
+		(let* ((ll (car prev))
+		       (start (cadr prev))
+		       (tail (match:suffix m))
+		       (end (match:start m))
+		       (s (substring/shared str start end))
+		       (groups (map (lambda (n) (match:substring m n))
+				    (iota (1- (match:count m))))))
+		  (list `(,@ll ,s ,@groups) (match:end m) tail)))
+	      flags)))
+    `(,@(car ret) ,(caddr ret))))
+
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 11:40               ` Nala Ginrut
@ 2011-12-30 11:47                 ` Nala Ginrut
  2011-12-30 15:23                   ` Daniel Hartwig
  0 siblings, 1 reply; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30 11:47 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 1669 bytes --]

Forget to load (srfi-1 iota) ,again...

On Fri, Dec 30, 2011 at 7:40 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:

> Great! It's better now.
> Here's the brand new patch~
>
>
> On Fri, Dec 30, 2011 at 5:42 PM, Daniel Hartwig <mandyke@gmail.com> wrote:
>
>> On 30 December 2011 16:46, Nala Ginrut <nalaginrut@gmail.com> wrote:
>> > hi Daniel! Very glad to see your reply.
>> > 1. I also think the order: (regexp str) is strange. But it's according
>> to
>> > python version.
>> > And I think the 'string-match' also put regexp before str. Anyway,
>> that's an
>> > easy mend.
>>
>> `regexp string' is also the same order as `list-matches' and
>> `fold-matches'.  Probably best to keep it that way if this is in the
>> regex module.
>>
>>
>> >> I would like to see your version support the Python semantics [1]:
>> >>
>> >> > If capturing parentheses are used in pattern, then the text of
>> >> > all groups in the pattern are also returned as part of the resulting
>> >> > list.
>> >> [...]
>> >> > >>> re.split('\W+', 'Words, words, words.')
>> >> > ['Words', 'words', 'words', '']
>> >> > >>> re.split('(\W+)', 'Words, words, words.')
>> >> > ['Words', ', ', 'words', ', ', 'words', '.', '']
>> >>
>> >> >>> re.split('((,)?\W+?)', 'Words, words, words.')
>> >> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
>>
>> FYI this can be achieved by changing the inner part to:
>>
>>    (let* ...
>>           (s (substring string start end))
>>           (groups (map (lambda (n) (match:substring m n))
>>                        (iota (1- (match:count m)) 1))))
>>      (list `(,@ll ,s ,@groups) (match:end m) tail)))
>>
>> Note: using srfi-1 iota
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 2754 bytes --]

[-- Attachment #2: 0001-ADD-regexp-split.patch --]
[-- Type: text/x-patch, Size: 1626 bytes --]

From 27aa85d56766d152eced21cd0d2915c70a99dcc7 Mon Sep 17 00:00:00 2001
From: NalaGinrut <NalaGinrut@gmail.com>
Date: Fri, 30 Dec 2011 19:46:01 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..e9b01ea 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -38,10 +38,11 @@
 ;;;; POSIX regex support functions.
 
 (define-module (ice-9 regex)
+  #:autoload (srfi srfi-1) (iota)
   #:export (match:count match:string match:prefix match:suffix
            regexp-match? regexp-quote match:start match:end match:substring
            string-match regexp-substitute fold-matches list-matches
-           regexp-substitute/global))
+           regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +227,19 @@
                         (begin
                           (do-item (car items)) ; This is not.
                           (next-item (cdr items)))))))))))
+                          
+(define* (regexp-split regex str #:optional (flags 0))
+  (let ((ret (fold-matches 
+	      regex str (list '() 0 '(""))
+	      (lambda (m prev)
+		(let* ((ll (car prev))
+		       (start (cadr prev))
+		       (tail (match:suffix m))
+		       (end (match:start m))
+		       (s (substring/shared str start end))
+		       (groups (map (lambda (n) (match:substring m n))
+				    (iota (1- (match:count m))))))
+		  (list `(,@ll ,s ,@groups) (match:end m) tail)))
+	      flags)))
+    `(,@(car ret) ,(caddr ret))))
+
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 10:56   ` Nala Ginrut
@ 2011-12-30 11:48     ` Marijn
  2011-12-30 11:52       ` Nala Ginrut
  0 siblings, 1 reply; 31+ messages in thread
From: Marijn @ 2011-12-30 11:48 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 30-12-11 11:56, Nala Ginrut wrote:
> Hmm, interesting! I must confess I'm not familiar with Racket, but
> I think the aim of Guile contains practicality.

Not sure what you're trying to imply here or to which of my points
you're responding.

> So I think regex-lib of Guile does this at least. Anyway, I believe
> an implementation should do its best to provide any useful
> mechanism for the user. Or it won't be popular anymore. When I talk
> about "useful", I mean "it brings the user convenient", not "the
> developer think it's useful".

Idem ditto.

> Just my mumble, no any offense. Thank you for telling us this
> issue. ;-)

Marijn
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk79pQ8ACgkQp/VmCx0OL2wMewCgml8guLqLK2fx7NWHa1JQ7pQ9
wrIAoLkHJnNF8nkWWMM4EKkyBEyffZEQ
=1IRy
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 11:48     ` Marijn
@ 2011-12-30 11:52       ` Nala Ginrut
  2011-12-30 13:23         ` Marijn
  0 siblings, 1 reply; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30 11:52 UTC (permalink / raw)
  To: Marijn; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1242 bytes --]

I just expressed "I think group capturing is useful and someone didn't
think that's true".
If this is not what your last mail mean, I think it's better to ignore it.

On Fri, Dec 30, 2011 at 7:48 PM, Marijn <hkBst@gentoo.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 30-12-11 11:56, Nala Ginrut wrote:
> > Hmm, interesting! I must confess I'm not familiar with Racket, but
> > I think the aim of Guile contains practicality.
>
> Not sure what you're trying to imply here or to which of my points
> you're responding.
>
> > So I think regex-lib of Guile does this at least. Anyway, I believe
> > an implementation should do its best to provide any useful
> > mechanism for the user. Or it won't be popular anymore. When I talk
> > about "useful", I mean "it brings the user convenient", not "the
> > developer think it's useful".
>
> Idem ditto.
>
> > Just my mumble, no any offense. Thank you for telling us this
> > issue. ;-)
>
> Marijn
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.18 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk79pQ8ACgkQp/VmCx0OL2wMewCgml8guLqLK2fx7NWHa1JQ7pQ9
> wrIAoLkHJnNF8nkWWMM4EKkyBEyffZEQ
> =1IRy
> -----END PGP SIGNATURE-----
>

[-- Attachment #2: Type: text/html, Size: 1847 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-29  9:32 [PATCH] add regexp-split Nala Ginrut
  2011-12-29  9:46 ` Nala Ginrut
  2011-12-30 10:14 ` Marijn
@ 2011-12-30 13:03 ` Neil Jerram
  2011-12-30 15:12   ` Nala Ginrut
  2011-12-30 15:33   ` Daniel Hartwig
  2 siblings, 2 replies; 31+ messages in thread
From: Neil Jerram @ 2011-12-30 13:03 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

Nala Ginrut <nalaginrut@gmail.com> writes:

> hi guilers!
> It seems like there's no "regexp-split" procedure in Guile.
> What we have is "string-split" which accepted Char only.
> So I wrote one for myself.

We've had this topic before, and it only needs a search for
"regex-split guile" to find it:
http://old.nabble.com/regex-split-for-Guile-td31093245.html.

       Neil



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 11:52       ` Nala Ginrut
@ 2011-12-30 13:23         ` Marijn
  2011-12-30 14:57           ` Daniel Hartwig
  0 siblings, 1 reply; 31+ messages in thread
From: Marijn @ 2011-12-30 13:23 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 30-12-11 12:52, Nala Ginrut wrote:
> I just expressed "I think group capturing is useful and someone 
> didn't think that's true". If this is not what your last mail
> mean, I think it's better to ignore it.

Group capturing is useful, but the question is whether it is useful in
the context of regexp-split. Maybe it is, maybe it isn't. Racket seems
to be doing it differently than python, so I think that constitutes
reason to look more closely. Certainly guile should follow racket over
python, everything else being equal, but usually everything isn't
equal if only one has a look and I'm saying that we should look at
least at other schemes for inspiration.
If you're so convinced that python is doing it right here and should
be followed, then perhaps you can give some examples of how capturing
groups are useful in a function that is supposed to split strings at
regexps.

Another data point:

[14:17] <hkBst> what does chicken return for (irregex-split "([^0-9])"
 "123+456*/")  ?
[14:18] <sjamaan> ("123" "456")

Looks like chicken doesn't do capturing groups in their version, but
they don't have the empty matches either. How about that...

Surely by now you can see that it's worth discussing over the
semantics of regexp-split.

Marijn



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk79u1YACgkQp/VmCx0OL2xpYACgpYuguKw4ju0GsX3ApqrZtjXF
ppsAn2wv0B8sNiSgtULA1TIFjiXh2Pdn
=C8E4
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 13:23         ` Marijn
@ 2011-12-30 14:57           ` Daniel Hartwig
  2011-12-31  1:46             ` Daniel Hartwig
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Hartwig @ 2011-12-30 14:57 UTC (permalink / raw)
  To: Marijn; +Cc: guile-devel

On 30 December 2011 21:23, Marijn <hkBst@gentoo.org> wrote:
> Group capturing is useful, but the question is whether it is useful in
> the context of regexp-split. Maybe it is, maybe it isn't. Racket seems
> to be doing it differently than python, so I think that constitutes
> reason to look more closely. Certainly guile should follow racket over
> python, everything else being equal, but usually everything isn't
> equal if only one has a look and I'm saying that we should look at
> least at other schemes for inspiration.
> If you're so convinced that python is doing it right here and should
> be followed, then perhaps you can give some examples of how capturing
> groups are useful in a function that is supposed to split strings at
> regexps.

Having the *option* to return the captured groups in `regexp-split' is
certainly useful -- consider implementing a parser [1].  If the
captured groups are not desired, then simply omit the grouping parens
from the expression.

[1] http://80.68.89.23/2003/Oct/26/reSplit/

>
> Another data point:
>
> [14:17] <hkBst> what does chicken return for (irregex-split "([^0-9])"
>  "123+456*/")  ?
> [14:18] <sjamaan> ("123" "456")
>
> Looks like chicken doesn't do capturing groups in their version, but
> they don't have the empty matches either. How about that...

For tokenizing I think you want to keep any empty strings, otherwise
you lose track of which `field' you are in (consider /etc/passwd
entries).  This also matches the existing behaviour of `string-split'.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 13:03 ` Neil Jerram
@ 2011-12-30 15:12   ` Nala Ginrut
  2011-12-30 16:26     ` Neil Jerram
  2012-01-07 22:44     ` Andy Wingo
  2011-12-30 15:33   ` Daniel Hartwig
  1 sibling, 2 replies; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30 15:12 UTC (permalink / raw)
  To: Neil Jerram; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]

Well, I see.
So the previous discussion didn't make this proc put into Guile?
Now that so many people interested in this topic.

On Fri, Dec 30, 2011 at 9:03 PM, Neil Jerram <neil@ossau.homelinux.net>wrote:

> Nala Ginrut <nalaginrut@gmail.com> writes:
>
> > hi guilers!
> > It seems like there's no "regexp-split" procedure in Guile.
> > What we have is "string-split" which accepted Char only.
> > So I wrote one for myself.
>
> We've had this topic before, and it only needs a search for
> "regex-split guile" to find it:
> http://old.nabble.com/regex-split-for-Guile-td31093245.html.
>
>       Neil
>

[-- Attachment #2: Type: text/html, Size: 1185 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 11:47                 ` Nala Ginrut
@ 2011-12-30 15:23                   ` Daniel Hartwig
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Hartwig @ 2011-12-30 15:23 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

On 30 December 2011 19:47, Nala Ginrut <nalaginrut@gmail.com> wrote:
> Forget to load (srfi-1 iota) ,again...
>
>
> On Fri, Dec 30, 2011 at 7:40 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:
>>
>> Great! It's better now.
>> Here's the brand new patch~

I notice that this does not handle the case where there are no matches:

scheme@(guile-user)> (regexp-split "[^0-9]" "123")
$26 = ((""))


  (let ((ret (fold-matches
              regex str (list '() 0 '(""))

becomes:

  (let ((ret (fold-matches
	      regex str (list '() 0 str)

and the result:

scheme@(guile-user)> (regexp-split "[^0-9]" "123")
$28 = ("123")
scheme@(guile-user)> (string-split "123" #\!)
$29 = ("123")


I also note that you are using `substring/shared' when I think you are
after `substring'.  Both of these are efficient and use shared memory
when they can, but there is a difference.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 13:03 ` Neil Jerram
  2011-12-30 15:12   ` Nala Ginrut
@ 2011-12-30 15:33   ` Daniel Hartwig
  2011-12-30 15:58     ` Nala Ginrut
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel Hartwig @ 2011-12-30 15:33 UTC (permalink / raw)
  To: Neil Jerram; +Cc: guile-devel

On 30 December 2011 21:03, Neil Jerram <neil@ossau.homelinux.net> wrote:
> Nala Ginrut <nalaginrut@gmail.com> writes:
>
>> hi guilers!
>> It seems like there's no "regexp-split" procedure in Guile.
>> What we have is "string-split" which accepted Char only.
>> So I wrote one for myself.
>
> We've had this topic before, and it only needs a search for
> "regex-split guile" to find it:
> http://old.nabble.com/regex-split-for-Guile-td31093245.html.
>

Good to see that there is continuing interest in this feature.

IMO, the implementation here is more elegant and readable for it's use
of `fold-matches'.  The first implementation from the thread you
mention effectively rolls it's own version of `fold-matches' over the
result of `list-matches' (which is implemented using `fold-matches'
!).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 15:33   ` Daniel Hartwig
@ 2011-12-30 15:58     ` Nala Ginrut
  0 siblings, 0 replies; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30 15:58 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

Now that we have previous thread on this topic, I think it's no need to
format a patch.

Maybe this will solve the problem:
(define* (regexp-split regex str #:optional (flags 0))
  (let ((ret (fold-matches
      regex str (list '() 0 str)
      (lambda (m prev)
(let* ((ll (car prev))
       (start (cadr prev))
       (tail (match:suffix m))
       (end (match:start m))
       (s (substring/shared str start end))
       (groups (map (lambda (n) (match:substring m n))
    (iota (1- (match:count m))))))
  (list `(,@ll ,s ,@groups) (match:end m) tail)))
      flags)))
    `(,@(car ret) ,(caddr ret))))


On Fri, Dec 30, 2011 at 11:33 PM, Daniel Hartwig <mandyke@gmail.com> wrote:

> On 30 December 2011 21:03, Neil Jerram <neil@ossau.homelinux.net> wrote:
> > Nala Ginrut <nalaginrut@gmail.com> writes:
> >
> >> hi guilers!
> >> It seems like there's no "regexp-split" procedure in Guile.
> >> What we have is "string-split" which accepted Char only.
> >> So I wrote one for myself.
> >
> > We've had this topic before, and it only needs a search for
> > "regex-split guile" to find it:
> > http://old.nabble.com/regex-split-for-Guile-td31093245.html.
> >
>
> Good to see that there is continuing interest in this feature.
>
> IMO, the implementation here is more elegant and readable for it's use
> of `fold-matches'.  The first implementation from the thread you
> mention effectively rolls it's own version of `fold-matches' over the
> result of `list-matches' (which is implemented using `fold-matches'
> !).
>

[-- Attachment #2: Type: text/html, Size: 3072 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 15:12   ` Nala Ginrut
@ 2011-12-30 16:26     ` Neil Jerram
  2011-12-30 16:46       ` Nala Ginrut
  2012-01-07 22:44     ` Andy Wingo
  1 sibling, 1 reply; 31+ messages in thread
From: Neil Jerram @ 2011-12-30 16:26 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

Nala Ginrut <nalaginrut@gmail.com> writes:

> Well, I see.
> So the previous discussion didn't make this proc put into Guile?
> Now that so many people interested in this topic. 

I'm afraid I can't recall what happened following that thread.

What feels important to me, though, is the elegance of the overall API.
There are already _some_ regex-related APIs in the core Guile library
(ice-9 regex) and I would guess that there are many many possible
variations of these and other string + regex processing APIs that one
might propose.  Also we've now demonstrated that regex-split can be
implemented, on top of the existing library, with only a few lines of
code.  Therefore I'd say (speaking only as an observer) that you need to
make a case for how your regex-split beautifully complements what's
already there in (ice-9 regex), or alternatively for replacing (ice-9
regex) with a more beautiful set of operations including regex-split.

Alternatively^2, you could package regex-split outside the core library,
as a test case for the guild hall.  Then it doesn't need to be justified
in relation to (ice-9 regex), it can just be a convenient module that
provides a more Python-like API.

Regards,
     Neil



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 16:26     ` Neil Jerram
@ 2011-12-30 16:46       ` Nala Ginrut
  0 siblings, 0 replies; 31+ messages in thread
From: Nala Ginrut @ 2011-12-30 16:46 UTC (permalink / raw)
  To: Neil Jerram; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1509 bytes --]

OK, I'll put this proc in my own lib since there's no regexp-lib but a
regexp-core in Guile. Anyway, it's almost completed now. One may copy the
final version if needed.

On Sat, Dec 31, 2011 at 12:26 AM, Neil Jerram <neil@ossau.homelinux.net>wrote:

> Nala Ginrut <nalaginrut@gmail.com> writes:
>
> > Well, I see.
> > So the previous discussion didn't make this proc put into Guile?
> > Now that so many people interested in this topic.
>
> I'm afraid I can't recall what happened following that thread.
>
> What feels important to me, though, is the elegance of the overall API.
> There are already _some_ regex-related APIs in the core Guile library
> (ice-9 regex) and I would guess that there are many many possible
> variations of these and other string + regex processing APIs that one
> might propose.  Also we've now demonstrated that regex-split can be
> implemented, on top of the existing library, with only a few lines of
> code.  Therefore I'd say (speaking only as an observer) that you need to
> make a case for how your regex-split beautifully complements what's
> already there in (ice-9 regex), or alternatively for replacing (ice-9
> regex) with a more beautiful set of operations including regex-split.
>
> Alternatively^2, you could package regex-split outside the core library,
> as a test case for the guild hall.  Then it doesn't need to be justified
> in relation to (ice-9 regex), it can just be a convenient module that
> provides a more Python-like API.
>
> Regards,
>     Neil
>

[-- Attachment #2: Type: text/html, Size: 1956 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 14:57           ` Daniel Hartwig
@ 2011-12-31  1:46             ` Daniel Hartwig
  2011-12-31  2:32               ` Eli Barzilay
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Hartwig @ 2011-12-31  1:46 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: guile-devel

Hello

On 31 December 2011 04:11, Eli Barzilay <eli@barzilay.org> wrote:
> [I don't think that I'm subscribed to the Guile list, but feel free to
> forward it there.]

Copying back the list and Marijn.

> 5 hours ago, Marijn wrote:
>> On 30-12-11 12:52, Nala Ginrut wrote:
>> > I just expressed "I think group capturing is useful and someone
>> > didn't think that's true". If this is not what your last mail
>> > mean, I think it's better to ignore it.
>>
>> Group capturing is useful, but the question is whether it is useful
>> in the context of regexp-split.
>
> Yes, that's exactly the point.  What I'm worried about is someone
> defining a regexp for several uses, for example:
>
>  (define rx "foo([0-9]*)")
>
> with the intention of using it for both splitting and other
> extraction.  (This is a bad example but it's a common case for
> regexps.)  The problem is that if you really want to just *split* with
> this pattern, you're stuck in bad-code-land...  Two possible
> solutions:
>
>  Do the split, then filter out the even-numbered items from the
>  result.
>
> This is bad not only because it's inefficiently allocaing substrings
> that will get discarded (investing work redundantly which will get
> trashed by more work) -- it's also bad because such code is sensitive
> to the number of groups.  Eg, if the pattern is changed to have two
> groups, then you need to modify the filtering now.  Another solution:
>
>  Tweak the regexp and turn all groups into non-capturing groups.
>
> This is something that I've run into several times, and IME it is a
> very bad solution.  Usually, you end up doing some half-assed job of
> this tweaking: you don't bother to cache the compiled expressions for
> speed, and you tend to introduce assumptions by mistake -- like
> assuming that all "("s that are not precedded by a backslash or
> followed by a "?:" are groups -- and fail miserably when the input is
> something like "...\\\\(..." or "...[0-9()]...".  Actually, that leads
> into yet another solution:
>
>  Explicitly say that your functions expect patterns without groups.
>
> That fails since it propagates the problem up for users of your code
> (they might need to maintain two versions of regexps too).  And since
> many of them are likely to skim the docs and just do whatever works
> for them, they can easily write code that can fail satisfying these
> assumptions -- and the fun part is that this happens, the result of
> such bugs is utterly confusing...
>
> Four hours ago, Daniel Hartwig wrote:
>> Having the *option* to return the captured groups in `regexp-split' is
>> certainly useful -- consider implementing a parser [1].  If the
>> captured groups are not desired, then simply omit the grouping parens
>> from the expression.
>
> Hopefully the above explains why I think that that "simply omit" can
> turn out to be a disaster...
>
> In any case, that's my reason for disliking that added functionality
> even if it "can be more useful".  Lucky for me, in Racket we also have
> the existing behavior with code that will break if we change it, so I
> don't need to argue my point much...
>
> And BTW, all of that is *not* to say that this functionality is
> useless -- just arguing for it to be provided under a different name.
>


How about having an optional argument to control the behaviour?  The
default could be to not include the groups, thus mimicking the output
of Guile's `string-split' and `regexp-split' in other Schemes.

If two procedures are implemented they will be almost verbatim copies
of each other.  The changes required in the body would be minimal:

 (groups (if incl-groups?
             (map (lambda (n) (match:substring m n))
                              (iota (1- (match:count m))))
             '())))

>
>> [...] If you're so convinced that python is doing it right here and
>> should be followed, then perhaps you can give some examples of how
>> capturing groups are useful in a function that is supposed to split
>> strings at regexps.
>
> I don't think that such examples will help.  It's obvious how it can
> be useful to have this feature -- the main issue is the kind of bugs
> that it will lead to.  (And in the above I tried to give some examples
> of how that's bad.)
>
>
>> Another data point:
>>
>> [14:17] <hkBst> what does chicken return for (irregex-split "([^0-9])"
>>  "123+456*/")  ?
>> [14:18] <sjamaan> ("123" "456")
>>
>> Looks like chicken doesn't do capturing groups in their version, but
>> they don't have the empty matches either. How about that...
>
> Yeah, we've considered these things for a while.  There is
> inconsistency between different languages and regexp libraries on how
> to deal with empty strings -- some drop them at the edges, some drop
> all of them, and IIRC, some even drop all empty empty strings.  Oh,
> and things get infinitely more amusing when you consider look-ahead
> and look-back patterns (including \b patterns)...
>
> You can see our tests here:
>
>  https://github.com/plt/racket/blob/master/collects/tests/racket/string.rktl#L37
>
> with some comparisons to Perl.
>

No comment on Perl's handling.

I think Racket does the right thing by keeping *all* the empty strings in place.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-31  1:46             ` Daniel Hartwig
@ 2011-12-31  2:32               ` Eli Barzilay
  2011-12-31  3:16                 ` Daniel Hartwig
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Barzilay @ 2011-12-31  2:32 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

40 minutes ago, Daniel Hartwig wrote:
> 
> How about having an optional argument to control the behaviour?  The
> default could be to not include the groups, thus mimicking the
> output of Guile's `string-split' and `regexp-split' in other
> Schemes.

That can work, though I personally prefer a separate name.  (But
obviously, my personal taste has zero weight for guile...)


> If two procedures are implemented they will be almost verbatim copies
> of each other.

Yeah, but that's not an argument in favor or against -- since you can
switch between:

  (define (foo x [other-behavior? #f]) ...code..)

and

  (define (foo-internal x other-behavior?) ...same code...)
  (define (foo x) (foo-internal x #f))
  (define (foo-other x) (foo-internal x #t))

where the internal function is not exported from the library.


> No comment on Perl's handling.
> 
> I think Racket does the right thing by keeping *all* the empty
> strings in place.

Well, I do think that Perl (as well as other libraries & languages)
are a good reference point to compare against...  If anything, you
should at least be aware of other design choices and why you went in a
different direction.  (And we did not follow perl in all aspects, as
those tests clarify.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-31  2:32               ` Eli Barzilay
@ 2011-12-31  3:16                 ` Daniel Hartwig
  2011-12-31  3:21                   ` Eli Barzilay
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Hartwig @ 2011-12-31  3:16 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: guile-devel

On 31 December 2011 10:32, Eli Barzilay <eli@barzilay.org> wrote:
> 40 minutes ago, Daniel Hartwig wrote:
>> If two procedures are implemented they will be almost verbatim copies
>> of each other.
>
> Yeah, but that's not an argument in favor or against -- since you can
> switch between:
>
>  (define (foo x [other-behavior? #f]) ...code..)
>
> and
>
>  (define (foo-internal x other-behavior?) ...same code...)
>  (define (foo x) (foo-internal x #f))
>  (define (foo-other x) (foo-internal x #t))
>
> where the internal function is not exported from the library.

Ah, I did not think of that :-)

>
>
>> No comment on Perl's handling.
>>
>> I think Racket does the right thing by keeping *all* the empty
>> strings in place.
>
> Well, I do think that Perl (as well as other libraries & languages)
> are a good reference point to compare against...  If anything, you
> should at least be aware of other design choices and why you went in a
> different direction.  (And we did not follow perl in all aspects, as
> those tests clarify.)
>

A good point.  I'm interested to find out the reasoning behind Perl's
decision to drop empty strings..  Seems a strange thing to do IMO.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-31  3:16                 ` Daniel Hartwig
@ 2011-12-31  3:21                   ` Eli Barzilay
  2011-12-31  4:37                     ` Daniel Hartwig
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Barzilay @ 2011-12-31  3:21 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

Just now, Daniel Hartwig wrote:
> On 31 December 2011 10:32, Eli Barzilay <eli@barzilay.org> wrote:
> > 40 minutes ago, Daniel Hartwig wrote:
> >>
> >> I think Racket does the right thing by keeping *all* the empty
> >> strings in place.
> >
> > Well, I do think that Perl (as well as other libraries &
> > languages) are a good reference point to compare against...  If
> > anything, you should at least be aware of other design choices and
> > why you went in a different direction.  (And we did not follow
> > perl in all aspects, as those tests clarify.)
> 
> A good point.  I'm interested to find out the reasoning behind
> Perl's decision to drop empty strings..  Seems a strange thing to do
> IMO.

I think that there's a general tendency to make things "nice" and
dropping these things for cases where what the user wants is
"obvious".  And then when you realize that making the function behave
differently sometimes is a bad idea, but you can't back off from the
earlier version without breaking a ton of code.  In any case, look
also at the Emacs solution of an optional argument to drop all empty
strings, with a weird behavior when no regexp is given...

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-31  3:21                   ` Eli Barzilay
@ 2011-12-31  4:37                     ` Daniel Hartwig
  2011-12-31  7:00                       ` Eli Barzilay
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Hartwig @ 2011-12-31  4:37 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: guile-devel

On 31 December 2011 11:21, Eli Barzilay <eli@barzilay.org> wrote:
>> A good point.  I'm interested to find out the reasoning behind
>> Perl's decision to drop empty strings..  Seems a strange thing to do
>> IMO.
>
> I think that there's a general tendency to make things "nice" and
> dropping these things for cases where what the user wants is
> "obvious".  And then when you realize that making the function behave
> differently sometimes is a bad idea, but you can't back off from the
> earlier version without breaking a ton of code.  In any case, look
> also at the Emacs solution of an optional argument to drop all empty
> strings, with a weird behavior when no regexp is given...

In Scheme it is easy for the user to remove the empty strings if
desired.  In Perl I'd say that this at least involves writing a loop
each time, hence their choice for the default "nice" behaviour.

The ease of using `filter' is a good case for keeping the empty
strings in Scheme version.

I could not find any mention of this optional Emacs arg. you talk
about; have a pointer for me?



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-31  4:37                     ` Daniel Hartwig
@ 2011-12-31  7:00                       ` Eli Barzilay
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Barzilay @ 2011-12-31  7:00 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

Two hours ago, Daniel Hartwig wrote:
> 
> I could not find any mention of this optional Emacs arg. you talk
> about; have a pointer for me?

It's the last optional argument of `split-string'.  The last paragraph
in the documentation notes that when you call this function with just
a single argument, then that last optional flag is t -- which is an
unconventional thing for elisp functions...

(BTW, I'm subscribed to the list now, so this should go through.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
  2011-12-30 15:12   ` Nala Ginrut
  2011-12-30 16:26     ` Neil Jerram
@ 2012-01-07 22:44     ` Andy Wingo
  1 sibling, 0 replies; 31+ messages in thread
From: Andy Wingo @ 2012-01-07 22:44 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

On Fri 30 Dec 2011 16:12, Nala Ginrut <nalaginrut@gmail.com> writes:

> So the previous discussion didn't make this proc put into Guile?
> Now that so many people interested in this topic. 

I think it would make a nice addition to (ice-9 regexp).  I don't think
we need to capture the delimiters though; some other function can do
that.

What do you think about this:

  (define* (regexp-split regex str #:optional (flags 0))
    (let ((ret (fold-matches
                regex str (cons '() 0)
                (lambda (m prev)
                  (let ((parts (car prev))
                        (start (cdr prev)))
                    (cons (cons (substring str start (match:start m))
                                parts)
                          (match:end m))))
                flags)))
      (reverse (cons (substring str (cdr ret)) (car ret)))))

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] add regexp-split
@ 2013-02-01  9:24 Nala Ginrut
  0 siblings, 0 replies; 31+ messages in thread
From: Nala Ginrut @ 2013-02-01  9:24 UTC (permalink / raw)
  To: guile-devel

I found a bug in my previous regexp-split implementation, and fixed now:

-------------------------------------code---------------------------------
(define* (regexp-split regex str #:optional (flags 0))
  (let ((ret (fold-matches 
              regex str (list '() 0 str)
              (lambda (m prev)
                (let* ((ll (car prev))
                       (start (cadr prev))
                       (tail (match:suffix m))
                       (end (match:start m))
                       (s (substring/shared str start end))
                       (groups (map (lambda (n) (match:substring m n))
                                    (iota (1- (match:count m)) 1))))
                  (list `(,@ll ,s ,@groups) (match:end m) tail)))
              flags)))
    `(,@(car ret) ,(caddr ret))))
-------------------------------------end---------------------------------

Now it works fine like Python's regexp-split:
(regexp-split "([^ ]+) (.+)" "a b[^ _]") 
==> ("" "a" "b[^ _]" "")

(regexp-split "([^0-9])([^+/*])" "123+456*/")
==> ("123" "+" "4" "56*/")

I discussed with Andy that regexp-split is a so very common thing that
we should add it into (ice-9 regex).

But considering there're three implementations so far, mine,cky's and
this:
http://lists.gnu.org/archive/html/guile-user/2011-03/msg00007.html

So...I'll left the decision for the maintainers. ;-)

The difference between them maybe: cky's is Perl style (plus Ruby/Java),
and mine is Python's (though I hate Python ;-P).

It's not important to any of them to be chosen, the real meaningful
thing is we do need regexp-split in Guile.

Regards.


Nala Ginrut <nalaginrut <at> gmail.com> writes:

> 
> 
> Now that we have previous thread on this topic, I think it's no need
to format a patch.
> 
> Maybe this will solve the problem:
> 
> (define* (regexp-split regex str #:optional (flags 0))
>   (let ((ret (fold-matches 
> 
> 	      regex str (list '() 0 str)
> 
> 	      (lambda (m prev)
> 
> 		(let* ((ll (car prev))
> 
> 		       (start (cadr prev))
> 
> 		       (tail (match:suffix m))
> 
> 		       (end (match:start m))
> 
> 		       (s (substring/shared str start end))
> 
> 		       (groups (map (lambda (n) (match:substring m n))
> 
> 				    (iota (1- (match:count m))))))
> 
> 		  (list `(, <at> ll ,s , <at> groups) (match:end m) tail)))
> 
> 	      flags)))
>     `(, <at> (car ret) ,(caddr ret))))
> 
> On Fri, Dec 30, 2011 at 11:33 PM, Daniel Hartwig <mandyke <at>
gmail.com> wrote:
> On 30 December 2011 21:03, Neil Jerram <neil <at> ossau.homelinux.net>
wrote:
> 
> > Nala Ginrut <nalaginrut <at> gmail.com> writes:
> >
> >> hi guilers!
> >> It seems like there's no "regexp-split" procedure in Guile.
> >> What we have is "string-split" which accepted Char only.
> >> So I wrote one for myself.
> >
> > We've had this topic before, and it only needs a search for
> > "regex-split guile" to find it:
> > http://old.nabble.com/regex-split-for-Guile-td31093245.html.
> >
> Good to see that there is continuing interest in this feature.
> IMO, the implementation here is more elegant and readable for it's use
> of `fold-matches'.  The first implementation from the thread you
> mention effectively rolls it's own version of `fold-matches' over the
> result of `list-matches' (which is implemented using `fold-matches'
> !).
> 
> 
> 
> 
> 
> 




^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2013-02-01  9:24 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-29  9:32 [PATCH] add regexp-split Nala Ginrut
2011-12-29  9:46 ` Nala Ginrut
2011-12-29 10:20   ` Nala Ginrut
2011-12-29 13:58     ` Nala Ginrut
2011-12-30  5:34       ` Daniel Hartwig
2011-12-30  8:46         ` Nala Ginrut
2011-12-30  9:05           ` Nala Ginrut
     [not found]           ` <CAN3veRdFQyOthFTSLE7v9x3_A4HTPX99DSmDx26dBkeyy=MTDQ@mail.gmail.com>
2011-12-30  9:42             ` Daniel Hartwig
2011-12-30 11:40               ` Nala Ginrut
2011-12-30 11:47                 ` Nala Ginrut
2011-12-30 15:23                   ` Daniel Hartwig
2011-12-30 10:14 ` Marijn
2011-12-30 10:56   ` Nala Ginrut
2011-12-30 11:48     ` Marijn
2011-12-30 11:52       ` Nala Ginrut
2011-12-30 13:23         ` Marijn
2011-12-30 14:57           ` Daniel Hartwig
2011-12-31  1:46             ` Daniel Hartwig
2011-12-31  2:32               ` Eli Barzilay
2011-12-31  3:16                 ` Daniel Hartwig
2011-12-31  3:21                   ` Eli Barzilay
2011-12-31  4:37                     ` Daniel Hartwig
2011-12-31  7:00                       ` Eli Barzilay
2011-12-30 13:03 ` Neil Jerram
2011-12-30 15:12   ` Nala Ginrut
2011-12-30 16:26     ` Neil Jerram
2011-12-30 16:46       ` Nala Ginrut
2012-01-07 22:44     ` Andy Wingo
2011-12-30 15:33   ` Daniel Hartwig
2011-12-30 15:58     ` Nala Ginrut
  -- strict thread matches above, loose matches on Subject: below --
2013-02-01  9:24 Nala Ginrut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).