From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: regexp-split for Guile Date: Fri, 12 Oct 2012 17:57:11 -0400 Message-ID: <87y5jbfj60.fsf@tines.lan> References: <20120917140133.GA6315@yarrow> <87lig830ox.fsf@zigzag.favinet> <20120917200603.GB6315@yarrow> <20120918195915.GE6315@yarrow> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1350079076 8386 80.91.229.3 (12 Oct 2012 21:57:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 12 Oct 2012 21:57:56 +0000 (UTC) Cc: guile-devel@gnu.org To: Daniel Hartwig Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri Oct 12 23:58:02 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TMnEz-0004U0-Bx for guile-devel@m.gmane.org; Fri, 12 Oct 2012 23:57:57 +0200 Original-Received: from localhost ([::1]:47431 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TMnEs-00050U-St for guile-devel@m.gmane.org; Fri, 12 Oct 2012 17:57:50 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:60325) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TMnEq-000506-Ob for guile-devel@gnu.org; Fri, 12 Oct 2012 17:57:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TMnEp-0001oh-L4 for guile-devel@gnu.org; Fri, 12 Oct 2012 17:57:48 -0400 Original-Received: from world.peace.net ([96.39.62.75]:53563) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TMnEp-0001nw-GG for guile-devel@gnu.org; Fri, 12 Oct 2012 17:57:47 -0400 Original-Received: from turntable.mit.edu ([18.160.0.29] helo=tines.lan) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1TMnEY-0001xB-1A; Fri, 12 Oct 2012 17:57:30 -0400 In-Reply-To: (Daniel Hartwig's message of "Sun, 7 Oct 2012 10:38:43 +0800") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14973 Archived-At: Daniel Hartwig writes: > On 19 September 2012 03:59, Chris K. Jester-Young wrot= e: >> (define* (regexp-split pat str #:optional (limit 0)) >> [=E2=80=A6] >> (reverse (if (zero? limit) >> (drop-while string-null? final) >> final)))) >> > > Please simplify this limit arg, removing the maybe-drop-empty-strings > behaviour. Either positive limit or #f for all matches. It is > trivial for the caller to remove the empty strings if desired, and > simplifies the docs for regexp-split. Matching perl semantics is not > necessarily desirable. FWIW, I agree with Daniel. I dislike the complicated semantics of this 'limit' argument, which combines into a single number two different concepts: * What limiting mode to use: [A] return 'limit' many fields at most [B] return all fields [C] return all fields except trailing blank fields * How many fields, if using limiting mode [A]. Beyond matters of taste, I don't like this because it makes bugs less likely to be caught. Suppose 'limit' is a computed value, normally expected to be positive. Code that follows may implicitly assume that the returned list has no more than 'limit' elements. Now suppose that due to a bug or exceptional circumstance, the computed 'limit' ends up being less than 1. Now 'regexp-split' switches to a qualitatively different mode of behavior. I'd prefer for a numeric limit to be interpreted in a uniform way. That suggests that a non-positive 'limit' should raise an exception. Limiting modes [B] and [C] could be indicated in a few different ways. One possibility would be to pass special symbol values for the 'limit' argument to indicate these two other modes. Another possibility is to add a 'drop-right-while' procedure (analogous to SRFI-1's 'drop-while'), and then users who want this could do: (drop-right-while string-null? (regexp-split ...)) Regards, Mark