From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Daniel Hartwig Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH] add regexp-split Date: Fri, 30 Dec 2011 13:34:12 +0800 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1325223265 13600 80.91.229.12 (30 Dec 2011 05:34:25 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 30 Dec 2011 05:34:25 +0000 (UTC) Cc: guile-devel To: Nala Ginrut Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri Dec 30 06:34:21 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RgV6i-0001S5-AO for guile-devel@m.gmane.org; Fri, 30 Dec 2011 06:34:20 +0100 Original-Received: from localhost ([::1]:55571 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RgV6h-0004MM-Tz for guile-devel@m.gmane.org; Fri, 30 Dec 2011 00:34:19 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:42696) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RgV6e-0004MA-JL for guile-devel@gnu.org; Fri, 30 Dec 2011 00:34:17 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RgV6d-0005Y8-Ao for guile-devel@gnu.org; Fri, 30 Dec 2011 00:34:16 -0500 Original-Received: from mail-iy0-f169.google.com ([209.85.210.169]:40074) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RgV6d-0005Xv-6y for guile-devel@gnu.org; Fri, 30 Dec 2011 00:34:15 -0500 Original-Received: by iacb35 with SMTP id b35so26919630iac.0 for ; Thu, 29 Dec 2011 21:34:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=MgNAnELnjVxC1tOJOH6DaIQ7czvirxOHvsBTxm2cUMo=; b=oRv59mUXlUhZ2gfpKwEE+wZdyASIbUY/MgKZs7d/+NR/CiQKySu+GgfoIZ8/tJoa0i UrXHYNxW2Xyjciw2eF+ifwq4OM2Tk6a1EsZmB4OddmYltYlReZ4EVE9QGC4389pMKm7H 6l1izgurzFC1pMxrV6rCZB5aUVy6K26rYhC+0= Original-Received: by 10.50.47.228 with SMTP id g4mr44648223ign.14.1325223252372; Thu, 29 Dec 2011 21:34:12 -0800 (PST) Original-Received: by 10.231.48.68 with HTTP; Thu, 29 Dec 2011 21:34:12 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.210.169 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:13195 Archived-At: Hello >>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut >>> wrote: >>>> >>>> hi guilers! >>>> It seems like there's no "regexp-split" procedure in Guile. >>>> What we have is "string-split" which accepted Char only. >>>> So I wrote one for myself. >>>> >>>> ------python code----- >>>> >>> import re >>>> >>> re.split("([^0-9])", "123+456*/") >>>> [=E2=80=99123=E2=80=99, =E2=80=99+=E2=80=99, =E2=80=99456=E2=80=99, = =E2=80=99*=E2=80=99, =E2=80=99=E2=80=99, =E2=80=99/=E2=80=99, =E2=80=99=E2= =80=99] >>>> --------code end------- >>>> >>>> The Guile version: >>>> >>>> ----------guile code------- >>>> (regexp-split=C2=A0"([^0-9])" =C2=A0"123+456*/") >>>> =3D=3D>("123" "+" "456" "*" "" "/" "") >>>> ----------code end-------- >>>> >>>> Anyone interested in it? >>>> Nice work! I have a couple of comments :-) The matched pattern/deliminator is included in the output: scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.") $21 =3D ("Words" ", " "words" ", " "words" "." "") scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.") $22 =3D ("Words" ", " "words" ", " "words" "." "") However, a user is not always interested in the deliminator. Consider the example given for string-split: scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:) $23 =3D ("root" "x" "0" "0" "root" "/root" "/bin/bash") This behaviour can be obtained with list-matches on the complement of REGEXP. scheme@(guile-user)> (map match:substring (list-matches "\\w+" "Words, words, words.")) $24 =3D ("Words" "words" "words") I would like to see your version support the Python semantics [1]: > If capturing parentheses are used in pattern, then the text of > all groups in the pattern are also returned as part of the resulting > list. [...] > >>> re.split('\W+', 'Words, words, words.') > ['Words', 'words', 'words', ''] > >>> re.split('(\W+)', 'Words, words, words.') > ['Words', ', ', 'words', ', ', 'words', '.', ''] >>> re.split('((,)?\W+?)', 'Words, words, words.') ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, ''] For the sake of consistency with the rest of the module perhaps support the `flags' option (just pass it to fold-matches) and use the same variable names, etc.: (define* (regexp-split regexp string #:optional (flags 0)) ... instead of: (define regexp-split (lambda (regex str) ... Also, to me the name seems unintuitive -- it is STR being split, not RE -- perhaps this can be folded in to the existing string-split function. A nice patch none-the-less! [1] http://docs.python.org/library/re.html#re.split