From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: nalaginrut Newsgroups: gmane.lisp.guile.devel Subject: Re: regexp-split for Guile Date: Tue, 18 Sep 2012 20:59:33 +0800 Organization: HFG Message-ID: <1347973173.2333.3.camel@Renee-SUSE.suse> References: <20120917140133.GA6315@yarrow> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1347973197 3019 80.91.229.3 (18 Sep 2012 12:59:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 18 Sep 2012 12:59:57 +0000 (UTC) Cc: guile-devel@gnu.org To: "Chris K. Jester-Young" Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Sep 18 15:00:01 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TDxPD-0003zu-3R for guile-devel@m.gmane.org; Tue, 18 Sep 2012 14:59:59 +0200 Original-Received: from localhost ([::1]:53597 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TDxP9-0000Gr-08 for guile-devel@m.gmane.org; Tue, 18 Sep 2012 08:59:55 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:45356) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TDxP2-0000Gf-3D for guile-devel@gnu.org; Tue, 18 Sep 2012 08:59:53 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TDxOv-0002I4-Hx for guile-devel@gnu.org; Tue, 18 Sep 2012 08:59:48 -0400 Original-Received: from mail-pb0-f41.google.com ([209.85.160.41]:55826) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TDxOv-0002Hq-8N for guile-devel@gnu.org; Tue, 18 Sep 2012 08:59:41 -0400 Original-Received: by pbbro12 with SMTP id ro12so12723864pbb.0 for ; Tue, 18 Sep 2012 05:59:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:in-reply-to:references :organization:content-type:x-mailer:content-transfer-encoding :mime-version; bh=z/XzygbdFMbyHoO2myqK92bJjMjrYSnB1frg1cyI0TM=; b=g4aUITrtkly6JfdPrJiuqewiUkU+zWPpJonLvHVTy9g3mcLrpFETXXJcI0ql35b/Ic OO+vuLa1Qhu924j2PInY2vt7EmRzjWk7liYCvUqKCZ9kD0vMYcI6lN4FRC6NMG45xqWr yMxFleJMh39KSBYoLsr9Rg06esLEYLN5Ng6Xk/kHywh3BtEF73x0ZtPUp1xMJ0Zfoh+n U9xNEQ1pNO8TfCpKUWJmIyaWjTtr7dZoCuWzL32URZeH94UmQE3U5EXJtD/fDgpatan1 sp1Jp05V7USM60iQzzfWnKhUkiI1R1MvdKoWXaNUsDTbWx82P4cjWgJYKVmYvpwMm/WC Iq1A== Original-Received: by 10.66.74.100 with SMTP id s4mr95454pav.27.1347973179428; Tue, 18 Sep 2012 05:59:39 -0700 (PDT) Original-Received: from [192.168.100.104] ([58.60.34.13]) by mx.google.com with ESMTPS id sr3sm8628013pbc.44.2012.09.18.05.59.35 (version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 05:59:38 -0700 (PDT) In-Reply-To: <20120917140133.GA6315@yarrow> X-Mailer: Evolution 3.2.3 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.160.41 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14893 Archived-At: I had the same topic before: http://lists.gnu.org/archive/html/guile-devel/2011-12/msg00247.html Actually, there's an older thread than mine before: http://old.nabble.com/regex-split-for-Guile-td31093245.html Anyway, if there're so many people like this nice thing, why not we add it (at any option of these three implementations) into ice-9? On Mon, 2012-09-17 at 10:01 -0400, Chris K. Jester-Young wrote: > Hi there, > > I'm currently implementing regexp-split for Guile, which provides a > Perl-style split function (including correctly implementing the "limit" > parameter), minus the special awk-style whitespace handling (that is > used with a pattern of " ", as opposed to / /, with Perl's split). > > Attached is a couple of patches, to support the regexp-split function > which I'm proposing at the bottom of this message: > > 1. The first fixes the behaviour of fold-matches and list-matches when > the pattern contains a ^ (identical to the patch in my last email). > 2. The second adds the ability to limit the number of matches done. > This applies on top of the first patch. > > Some comments about the regexp-split implementation: the value that's > being passed to regexp-split-fold is a cons, where the car is the last > match's end position, and the cdr is the substrings so far collected. > > The special check in regexp-split-fold for match-end being zero is to > emulate a specific behaviour as documented for Perl's split: "Empty > leading fields are produced when there are positive-width matches at > the beginning of the string; a zero-width match at the beginning of the > string does not produce an empty field." > > Below is the implementation; comments are welcome! If it all looks good, > I'll write tests and documentation, with a view to eventually putting it > into (ice-9 regex). > > Thanks, > Chris. > > * * * > > (define (regexp-split-fold match prev) > (if (zero? (match:end match)) prev > (cons* (match:end match) > (substring (match:string match) (car prev) (match:start match)) > (cdr prev)))) > > (define (string-empty? str) > (zero? (string-length str))) > > (define* (regexp-split pat str #:optional (limit 0)) > (let* ((result (fold-matches pat str '(0) regexp-split-fold 0 > (if (positive? limit) (1- limit) #f))) > (final (cons (substring str (car result)) (cdr result)))) > (reverse! (if (zero? limit) (drop-while string-empty? final) final)))) > differences between files attachment > (0001-In-fold-matches-set-regexp-notbol-unless-matching-st.patch) > From da8b0cd523f6e9bf9e1d46829cccf01e3115c614 Mon Sep 17 00:00:00 2001 > From: "Chris K. Jester-Young" > Date: Sun, 16 Sep 2012 02:20:56 -0400 > Subject: [PATCH 1/2] In fold-matches, set regexp/notbol unless matching > string start. > > * module/ice-9/regex.scm (fold-matches): Set regexp/notbol if the > starting position is nonzero. > * test-suite/tests/regexp.test (fold-matches): Check that when > matching /^foo/ against "foofoofoofoo", only one match results. > --- > module/ice-9/regex.scm | 3 ++- > test-suite/tests/regexp.test | 9 ++++++++- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm > index f7b94b7..08ae2c2 100644 > --- a/module/ice-9/regex.scm > +++ b/module/ice-9/regex.scm > @@ -172,8 +172,9 @@ > (let loop ((start 0) > (value init) > (abuts #f)) ; True if start abuts a previous match. > + (define bol (if (zero? start) 0 regexp/notbol)) > (let ((m (if (> start (string-length string)) #f > - (regexp-exec regexp string start flags)))) > + (regexp-exec regexp string start (logior flags bol))))) > (cond > ((not m) value) > ((and (= (match:start m) (match:end m)) abuts) > diff --git a/test-suite/tests/regexp.test b/test-suite/tests/regexp.test > index ef59465..d549df2 100644 > --- a/test-suite/tests/regexp.test > +++ b/test-suite/tests/regexp.test > @@ -132,7 +132,14 @@ > (lambda (match result) > (cons (match:substring match) > result)) > - (logior regexp/notbol regexp/noteol))))) > + (logior regexp/notbol regexp/noteol)))) > + > + (pass-if "regexp/notbol is set correctly" > + (equal? '("foo") > + (fold-matches "^foo" "foofoofoofoo" '() > + (lambda (match result) > + (cons (match:substring match) > + result)))))) > > > ;;; > differences between files attachment > (0002-Add-limit-parameter-to-fold-matches-and-list-matches.patch) > From 147dc0d7fd9ab04d10b4f13cecf47a32c5b6c4b6 Mon Sep 17 00:00:00 2001 > From: "Chris K. Jester-Young" > Date: Mon, 17 Sep 2012 01:06:07 -0400 > Subject: [PATCH 2/2] Add "limit" parameter to fold-matches and list-matches. > > * doc/ref/api-regex.texi: Document new "limit" parameter. > > * module/ice-9/regex.scm (fold-matches, list-matches): Optionally take > a "limit" argument that, if specified, limits how many times the > pattern is matched. > > * test-suite/tests/regexp.test (fold-matches): Add tests for the correct > functioning of the limit parameter. > --- > doc/ref/api-regex.texi | 10 ++++++---- > module/ice-9/regex.scm | 18 ++++++++++-------- > test-suite/tests/regexp.test | 16 +++++++++++++++- > 3 files changed, 31 insertions(+), 13 deletions(-) > > diff --git a/doc/ref/api-regex.texi b/doc/ref/api-regex.texi > index 082fb87..2d2243f 100644 > --- a/doc/ref/api-regex.texi > +++ b/doc/ref/api-regex.texi > @@ -189,11 +189,12 @@ or @code{#f} otherwise. > @end deffn > > @sp 1 > -@deffn {Scheme Procedure} list-matches regexp str [flags] > +@deffn {Scheme Procedure} list-matches regexp str [flags [limit]] > Return a list of match structures which are the non-overlapping > matches of @var{regexp} in @var{str}. @var{regexp} can be either a > pattern string or a compiled regexp. The @var{flags} argument is as > -per @code{regexp-exec} above. > +per @code{regexp-exec} above. The @var{limit} argument, if specified, > +limits how many times @var{regexp} is matched. > > @example > (map match:substring (list-matches "[a-z]+" "abc 42 def 78")) > @@ -201,11 +202,12 @@ per @code{regexp-exec} above. > @end example > @end deffn > > -@deffn {Scheme Procedure} fold-matches regexp str init proc [flags] > +@deffn {Scheme Procedure} fold-matches regexp str init proc [flags [limit]] > Apply @var{proc} to the non-overlapping matches of @var{regexp} in > @var{str}, to build a result. @var{regexp} can be either a pattern > string or a compiled regexp. The @var{flags} argument is as per > -@code{regexp-exec} above. > +@code{regexp-exec} above. The @var{limit} argument, if specified, > +limits how many times @var{regexp} is matched. > > @var{proc} is called as @code{(@var{proc} match prev)} where > @var{match} is a match structure and @var{prev} is the previous return > diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm > index 08ae2c2..0ffe74c 100644 > --- a/module/ice-9/regex.scm > +++ b/module/ice-9/regex.scm > @@ -167,26 +167,28 @@ > ;;; `b'. Around or within `xxx', only the match covering all three > ;;; x's counts, because the rest are not maximal. > > -(define* (fold-matches regexp string init proc #:optional (flags 0)) > +(define* (fold-matches regexp string init proc #:optional (flags 0) limit) > (let ((regexp (if (regexp? regexp) regexp (make-regexp regexp)))) > (let loop ((start 0) > + (count 0) > (value init) > (abuts #f)) ; True if start abuts a previous match. > - (define bol (if (zero? start) 0 regexp/notbol)) > - (let ((m (if (> start (string-length string)) #f > - (regexp-exec regexp string start (logior flags bol))))) > + (let* ((bol (if (zero? start) 0 regexp/notbol)) > + (m (and (or (not limit) (< count limit)) > + (<= start (string-length string)) > + (regexp-exec regexp string start (logior flags bol))))) > (cond > ((not m) value) > ((and (= (match:start m) (match:end m)) abuts) > ;; We matched an empty string, but that would overlap the > ;; match immediately before. Try again at a position > ;; further to the right. > - (loop (+ start 1) value #f)) > + (loop (1+ start) count value #f)) > (else > - (loop (match:end m) (proc m value) #t))))))) > + (loop (match:end m) (1+ count) (proc m value) #t))))))) > > -(define* (list-matches regexp string #:optional (flags 0)) > - (reverse! (fold-matches regexp string '() cons flags))) > +(define* (list-matches regexp string #:optional (flags 0) limit) > + (reverse! (fold-matches regexp string '() cons flags limit))) > > (define (regexp-substitute/global port regexp string . items) > > diff --git a/test-suite/tests/regexp.test b/test-suite/tests/regexp.test > index d549df2..c3ba698 100644 > --- a/test-suite/tests/regexp.test > +++ b/test-suite/tests/regexp.test > @@ -139,7 +139,21 @@ > (fold-matches "^foo" "foofoofoofoo" '() > (lambda (match result) > (cons (match:substring match) > - result)))))) > + result))))) > + > + (pass-if "without limit" > + (equal? '("foo" "foo" "foo" "foo") > + (fold-matches "foo" "foofoofoofoo" '() > + (lambda (match result) > + (cons (match:substring match) > + result))))) > + > + (pass-if "with limit" > + (equal? '("foo" "foo") > + (fold-matches "foo" "foofoofoofoo" '() > + (lambda (match result) > + (cons (match:substring match) > + result)) 0 2)))) > > > ;;;