From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Nala Ginrut Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH] add regexp-split Date: Fri, 30 Dec 2011 16:46:53 +0800 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=bcaec50161b7477d5e04b54b466b X-Trace: dough.gmane.org 1325234834 6283 80.91.229.12 (30 Dec 2011 08:47:14 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 30 Dec 2011 08:47:14 +0000 (UTC) Cc: guile-devel To: Daniel Hartwig Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri Dec 30 09:47:09 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RgY7G-0001tZ-2J for guile-devel@m.gmane.org; Fri, 30 Dec 2011 09:47:06 +0100 Original-Received: from localhost ([::1]:55521 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RgY7F-0008TC-1h for guile-devel@m.gmane.org; Fri, 30 Dec 2011 03:47:05 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:49612) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RgY76-0008Rd-Bs for guile-devel@gnu.org; Fri, 30 Dec 2011 03:47:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RgY74-00078A-AU for guile-devel@gnu.org; Fri, 30 Dec 2011 03:46:56 -0500 Original-Received: from mail-vw0-f41.google.com ([209.85.212.41]:51948) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RgY73-00077n-Rd for guile-devel@gnu.org; Fri, 30 Dec 2011 03:46:54 -0500 Original-Received: by vbbfn1 with SMTP id fn1so12804330vbb.0 for ; Fri, 30 Dec 2011 00:46:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=XN6d1ASBIivnc0sISGaADocogfEaUGYLrAa7OlFxWM4=; b=sUBNZBGIAgUZ3ZmMx3kelw40f726ylREBsSQBP2ThJpNebFdU2qV0VAUnKjUSvCWVL wm6+oq3Uhg1sZt0G8pAHr0WdKsHx6dfrcAVtZpNhZ1W9O83xG45/GlIua0J+KYltQiRf Nh/GBX8Ag+kW4zvOTSRzedLfE+jFzjeYcAZqo= Original-Received: by 10.52.99.231 with SMTP id et7mr18421924vdb.114.1325234813107; Fri, 30 Dec 2011 00:46:53 -0800 (PST) Original-Received: by 10.52.183.194 with HTTP; Fri, 30 Dec 2011 00:46:53 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.212.41 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:13196 Archived-At: --bcaec50161b7477d5e04b54b466b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable hi Daniel! Very glad to see your reply. 1. I also think the order: (regexp str) is strange. But it's according to python version. And I think the 'string-match' also put regexp before str. Anyway, that's an easy mend. 2. I think it's a little different to implement a flag as python version. Since "ignorecase" flag must be passed to make-regexp. So we can't use fold-matches. Hmm...let me see what I can do... On Fri, Dec 30, 2011 at 1:34 PM, Daniel Hartwig wrote: > Hello > > >>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut > >>> wrote: > >>>> > >>>> hi guilers! > >>>> It seems like there's no "regexp-split" procedure in Guile. > >>>> What we have is "string-split" which accepted Char only. > >>>> So I wrote one for myself. > >>>> > >>>> ------python code----- > >>>> >>> import re > >>>> >>> re.split("([^0-9])", "123+456*/") > >>>> [=E2=80=99123=E2=80=99, =E2=80=99+=E2=80=99, =E2=80=99456=E2=80=99, = =E2=80=99*=E2=80=99, =E2=80=99=E2=80=99, =E2=80=99/=E2=80=99, =E2=80=99=E2= =80=99] > >>>> --------code end------- > >>>> > >>>> The Guile version: > >>>> > >>>> ----------guile code------- > >>>> (regexp-split "([^0-9])" "123+456*/") > >>>> =3D=3D>("123" "+" "456" "*" "" "/" "") > >>>> ----------code end-------- > >>>> > >>>> Anyone interested in it? > >>>> > > Nice work! I have a couple of comments :-) > > > The matched pattern/deliminator is included in the output: > > scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.") > $21 =3D ("Words" ", " "words" ", " "words" "." "") > scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.") > $22 =3D ("Words" ", " "words" ", " "words" "." "") > > However, a user is not always interested in the deliminator. Consider > the example given for string-split: > > scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:) > $23 =3D ("root" "x" "0" "0" "root" "/root" "/bin/bash") > > This behaviour can be obtained with list-matches on the complement of > REGEXP. > > scheme@(guile-user)> (map match:substring > (list-matches "\\w+" "Words, words, words.")) > $24 =3D ("Words" "words" "words") > > I would like to see your version support the Python semantics [1]: > > > If capturing parentheses are used in pattern, then the text of > > all groups in the pattern are also returned as part of the resulting > > list. > [...] > > >>> re.split('\W+', 'Words, words, words.') > > ['Words', 'words', 'words', ''] > > >>> re.split('(\W+)', 'Words, words, words.') > > ['Words', ', ', 'words', ', ', 'words', '.', ''] > > >>> re.split('((,)?\W+?)', 'Words, words, words.') > ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, ''] > > > For the sake of consistency with the rest of the module perhaps > support the `flags' option (just pass it to fold-matches) and use the > same variable names, etc.: > > (define* (regexp-split regexp string #:optional (flags 0)) > ... > > instead of: > > (define regexp-split > (lambda (regex str) > ... > > > Also, to me the name seems unintuitive -- it is STR being split, not > RE -- perhaps this can be folded in to the existing string-split > function. > > > A nice patch none-the-less! > > > [1] http://docs.python.org/library/re.html#re.split > --bcaec50161b7477d5e04b54b466b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable hi Daniel! Very glad to see your reply.
1. I also think the order: (reg= exp str) is strange. But it's according to python version.
An= d I think the 'string-match' also put regexp before str. Anyway, th= at's an easy mend.
2. I think it's a little different to implement a flag as python v= ersion. Since "ignorecase" flag must
be passed to make-= regexp. So we can't use fold-matches.=C2=A0
Hmm...let me see = what I can do... =C2=A0

On Fri, Dec 30, 2011 at 1:34 PM, Daniel= Hartwig <mandyke= @gmail.com> wrote:
Hello

>>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalaginrut@gmail.com>
>>> wrote:
>>>>
>>>> hi guilers!
>>>> It seems like there's no "regexp-split" proc= edure in Guile.
>>>> What we have is "string-split" which accepted Ch= ar only.
>>>> So I wrote one for myself.
>>>>
>>>> ------python code-----
>>>> >>> import re
>>>> >>> re.split("([^0-9])", "123+456*= /")
>>>> [=E2=80=99123=E2=80=99, =E2=80=99+=E2=80=99, =E2=80=99456= =E2=80=99, =E2=80=99*=E2=80=99, =E2=80=99=E2=80=99, =E2=80=99/=E2=80=99, = =E2=80=99=E2=80=99]
>>>> --------code end-------
>>>>
>>>> The Guile version:
>>>>
>>>> ----------guile code-------
>>>> (regexp-split=C2=A0"([^0-9])" =C2=A0"123+45= 6*/")
>>>> =3D=3D>("123" "+" "456" &= quot;*" "" "/" "")
>>>> ----------code end--------
>>>>
>>>> Anyone interested in it?
>>>>

Nice work! =C2=A0I have a couple of comments :-)


The matched pattern/deliminator is included in the output:

scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words= , words.")
$21 =3D ("Words" ", " "words" ", " = "words" "." "")
scheme@(guile-user)> (regexp-split "\\W+" "Words, words, = words.")
$22 =3D ("Words" ", " "words" ", " = "words" "." "")

However, a user is not always interested in the deliminator. =C2=A0Consider=
the example given for string-split:

scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash= " #\:)
$23 =3D ("root" "x" "0" "0" "r= oot" "/root" "/bin/bash")

This behaviour can be obtained with list-matches on the complement of
REGEXP.

scheme@(guile-user)> (map match:substring
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0(list-matches "\\w+" "Words, words, words.&= quot;))
$24 =3D ("Words" "words" "words")

I would like to see your version support the Python semantics [1]:

> If capturing parentheses are used in pattern, then the text of
> all groups in the pattern are also returned as part of the resulting > list.
[...]
> >>> re.split('\W+', 'Words, words, words.') > ['Words', 'words', 'words', '']
> >>> re.split('(\W+)', 'Words, words, words.')=
> ['Words', ', ', 'words', ', ', 'wo= rds', '.', '']

>>> re.split('((,)?\W+?)', 'Words, words, words.')=
['Words', ', ', ',', 'words', ', ',= ',', 'words', '.', None, '']


For the sake of consistency with the rest of the module perhaps
support the `flags' option (just pass it to fold-matches) and use the same variable names, etc.:

(define* (regexp-split regexp string #:optional (flags 0))
=C2=A0...

instead of:

(define regexp-split
=C2=A0(lambda (regex str)
=C2=A0...


Also, to me the name seems unintuitive -- it is STR being split, not
RE -- perhaps this can be folded in to the existing string-split
function.


A nice patch none-the-less!


[1] http://docs.python.org/library/re.html#re.split

--bcaec50161b7477d5e04b54b466b--