unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] add regexp-split
@ 2011-12-29  9:32 Nala Ginrut
  2011-12-29  9:46 ` Nala Ginrut
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Nala Ginrut @ 2011-12-29  9:32 UTC (permalink / raw)
  To: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 517 bytes --]

hi guilers!
It seems like there's no "regexp-split" procedure in Guile.
What we have is "string-split" which accepted Char only.
So I wrote one for myself.

------python code-----
>>> import re
>>> re.split("([^0-9])", "123+456*/")
[’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
--------code end-------

The Guile version:

----------guile code-------
(regexp-split "([^0-9])"  "123+456*/")
==>("123" "+" "456" "*" "" "/" "")
----------code end--------

Anyone interested in it?

[-- Attachment #1.2: Type: text/html, Size: 884 bytes --]

[-- Attachment #2: 0001-ADD-regexp-split.patch --]
[-- Type: text/x-patch, Size: 1571 bytes --]

From eb0bb80c86c9539712b78cf8902d230e0c4e778e Mon Sep 17 00:00:00 2001
From: NalaGinrut <NalaGinrut@gmail.com>
Date: Thu, 29 Dec 2011 17:25:03 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   23 ++++++++++++++++++++++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..5a90c67 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -41,7 +41,7 @@
   #:export (match:count match:string match:prefix match:suffix
            regexp-match? regexp-quote match:start match:end match:substring
            string-match regexp-substitute fold-matches list-matches
-           regexp-substitute/global))
+           regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +226,24 @@
                         (begin
                           (do-item (car items)) ; This is not.
                           (next-item (cdr items)))))))))))
+                          
+(define regexp-split
+  (lambda (regex str)
+    (let* ([len (string-length str)]
+	   [ret (fold-matches 
+		 regex str (list '() 0 0 '(""))
+		 (lambda (m prev)
+		   (let* ([ll (car prev)]
+			  [count (1+ (cadr prev))]
+			  [start (caddr prev)]
+			  [tail (match:suffix m)]
+			  [end (match:start m)]
+			  [s (string-copy str start end)]
+			  )
+		     (list `(,@ll ,s ,(match:substring m)) 
+			   count (match:end m) tail)
+		     )))] ;; end fold-matches
+	   ) ;; end let*
+      `(,@(car ret) ,(cadddr ret))
+      )))
+                                
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread
* Re: [PATCH] add regexp-split
@ 2013-02-01  9:24 Nala Ginrut
  0 siblings, 0 replies; 31+ messages in thread
From: Nala Ginrut @ 2013-02-01  9:24 UTC (permalink / raw)
  To: guile-devel

I found a bug in my previous regexp-split implementation, and fixed now:

-------------------------------------code---------------------------------
(define* (regexp-split regex str #:optional (flags 0))
  (let ((ret (fold-matches 
              regex str (list '() 0 str)
              (lambda (m prev)
                (let* ((ll (car prev))
                       (start (cadr prev))
                       (tail (match:suffix m))
                       (end (match:start m))
                       (s (substring/shared str start end))
                       (groups (map (lambda (n) (match:substring m n))
                                    (iota (1- (match:count m)) 1))))
                  (list `(,@ll ,s ,@groups) (match:end m) tail)))
              flags)))
    `(,@(car ret) ,(caddr ret))))
-------------------------------------end---------------------------------

Now it works fine like Python's regexp-split:
(regexp-split "([^ ]+) (.+)" "a b[^ _]") 
==> ("" "a" "b[^ _]" "")

(regexp-split "([^0-9])([^+/*])" "123+456*/")
==> ("123" "+" "4" "56*/")

I discussed with Andy that regexp-split is a so very common thing that
we should add it into (ice-9 regex).

But considering there're three implementations so far, mine,cky's and
this:
http://lists.gnu.org/archive/html/guile-user/2011-03/msg00007.html

So...I'll left the decision for the maintainers. ;-)

The difference between them maybe: cky's is Perl style (plus Ruby/Java),
and mine is Python's (though I hate Python ;-P).

It's not important to any of them to be chosen, the real meaningful
thing is we do need regexp-split in Guile.

Regards.


Nala Ginrut <nalaginrut <at> gmail.com> writes:

> 
> 
> Now that we have previous thread on this topic, I think it's no need
to format a patch.
> 
> Maybe this will solve the problem:
> 
> (define* (regexp-split regex str #:optional (flags 0))
>   (let ((ret (fold-matches 
> 
> 	      regex str (list '() 0 str)
> 
> 	      (lambda (m prev)
> 
> 		(let* ((ll (car prev))
> 
> 		       (start (cadr prev))
> 
> 		       (tail (match:suffix m))
> 
> 		       (end (match:start m))
> 
> 		       (s (substring/shared str start end))
> 
> 		       (groups (map (lambda (n) (match:substring m n))
> 
> 				    (iota (1- (match:count m))))))
> 
> 		  (list `(, <at> ll ,s , <at> groups) (match:end m) tail)))
> 
> 	      flags)))
>     `(, <at> (car ret) ,(caddr ret))))
> 
> On Fri, Dec 30, 2011 at 11:33 PM, Daniel Hartwig <mandyke <at>
gmail.com> wrote:
> On 30 December 2011 21:03, Neil Jerram <neil <at> ossau.homelinux.net>
wrote:
> 
> > Nala Ginrut <nalaginrut <at> gmail.com> writes:
> >
> >> hi guilers!
> >> It seems like there's no "regexp-split" procedure in Guile.
> >> What we have is "string-split" which accepted Char only.
> >> So I wrote one for myself.
> >
> > We've had this topic before, and it only needs a search for
> > "regex-split guile" to find it:
> > http://old.nabble.com/regex-split-for-Guile-td31093245.html.
> >
> Good to see that there is continuing interest in this feature.
> IMO, the implementation here is more elegant and readable for it's use
> of `fold-matches'.  The first implementation from the thread you
> mention effectively rolls it's own version of `fold-matches' over the
> result of `list-matches' (which is implemented using `fold-matches'
> !).
> 
> 
> 
> 
> 
> 




^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2013-02-01  9:24 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-29  9:32 [PATCH] add regexp-split Nala Ginrut
2011-12-29  9:46 ` Nala Ginrut
2011-12-29 10:20   ` Nala Ginrut
2011-12-29 13:58     ` Nala Ginrut
2011-12-30  5:34       ` Daniel Hartwig
2011-12-30  8:46         ` Nala Ginrut
2011-12-30  9:05           ` Nala Ginrut
     [not found]           ` <CAN3veRdFQyOthFTSLE7v9x3_A4HTPX99DSmDx26dBkeyy=MTDQ@mail.gmail.com>
2011-12-30  9:42             ` Daniel Hartwig
2011-12-30 11:40               ` Nala Ginrut
2011-12-30 11:47                 ` Nala Ginrut
2011-12-30 15:23                   ` Daniel Hartwig
2011-12-30 10:14 ` Marijn
2011-12-30 10:56   ` Nala Ginrut
2011-12-30 11:48     ` Marijn
2011-12-30 11:52       ` Nala Ginrut
2011-12-30 13:23         ` Marijn
2011-12-30 14:57           ` Daniel Hartwig
2011-12-31  1:46             ` Daniel Hartwig
2011-12-31  2:32               ` Eli Barzilay
2011-12-31  3:16                 ` Daniel Hartwig
2011-12-31  3:21                   ` Eli Barzilay
2011-12-31  4:37                     ` Daniel Hartwig
2011-12-31  7:00                       ` Eli Barzilay
2011-12-30 13:03 ` Neil Jerram
2011-12-30 15:12   ` Nala Ginrut
2011-12-30 16:26     ` Neil Jerram
2011-12-30 16:46       ` Nala Ginrut
2012-01-07 22:44     ` Andy Wingo
2011-12-30 15:33   ` Daniel Hartwig
2011-12-30 15:58     ` Nala Ginrut
  -- strict thread matches above, loose matches on Subject: below --
2013-02-01  9:24 Nala Ginrut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).