Well, I realized it's a mistake. We can use fold-matches anyway.
hi Daniel! Very glad to see your reply.1. I also think the order: (regexp str) is strange. But it's according to python version.And I think the 'string-match' also put regexp before str. Anyway, that's an easy mend.2. I think it's a little different to implement a flag as python version. Since "ignorecase" flag mustbe passed to make-regexp. So we can't use fold-matches.Hmm...let me see what I can do...On Fri, Dec 30, 2011 at 1:34 PM, Daniel Hartwig <mandyke@gmail.com> wrote:Hello
Nice work! I have a couple of comments :-)
>>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalaginrut@gmail.com>
>>> wrote:
>>>>
>>>> hi guilers!
>>>> It seems like there's no "regexp-split" procedure in Guile.
>>>> What we have is "string-split" which accepted Char only.
>>>> So I wrote one for myself.
>>>>
>>>> ------python code-----
>>>> >>> import re
>>>> >>> re.split("([^0-9])", "123+456*/")
>>>> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
>>>> --------code end-------
>>>>
>>>> The Guile version:
>>>>
>>>> ----------guile code-------
>>>> (regexp-split "([^0-9])" "123+456*/")
>>>> ==>("123" "+" "456" "*" "" "/" "")
>>>> ----------code end--------
>>>>
>>>> Anyone interested in it?
>>>>
The matched pattern/deliminator is included in the output:
scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.")
$21 = ("Words" ", " "words" ", " "words" "." "")
scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.")
$22 = ("Words" ", " "words" ", " "words" "." "")
However, a user is not always interested in the deliminator. Consider
the example given for string-split:
scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:)
$23 = ("root" "x" "0" "0" "root" "/root" "/bin/bash")
This behaviour can be obtained with list-matches on the complement of
REGEXP.
scheme@(guile-user)> (map match:substring
(list-matches "\\w+" "Words, words, words."))
$24 = ("Words" "words" "words")
I would like to see your version support the Python semantics [1]:
> If capturing parentheses are used in pattern, then the text of
> all groups in the pattern are also returned as part of the resulting
> list.
[...]
> >>> re.split('\W+', 'Words, words, words.')
> ['Words', 'words', 'words', '']
> >>> re.split('(\W+)', 'Words, words, words.')
> ['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('((,)?\W+?)', 'Words, words, words.')
['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
For the sake of consistency with the rest of the module perhaps
support the `flags' option (just pass it to fold-matches) and use the
same variable names, etc.:
(define* (regexp-split regexp string #:optional (flags 0))
...
instead of:
(define regexp-split
(lambda (regex str)
...
Also, to me the name seems unintuitive -- it is STR being split, not
RE -- perhaps this can be folded in to the existing string-split
function.
A nice patch none-the-less!
[1] http://docs.python.org/library/re.html#re.split