From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Chris K. Jester-Young" Newsgroups: gmane.lisp.guile.devel Subject: Re: regexp-split for Guile Date: Sat, 20 Oct 2012 00:01:26 -0400 Message-ID: <20121020040126.GA25831@yarrow> References: <20120917140133.GA6315@yarrow> <87lig830ox.fsf@zigzag.favinet> <20120917200603.GB6315@yarrow> <20120918195915.GE6315@yarrow> <87y5jbfj60.fsf@tines.lan> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1350705703 20425 80.91.229.3 (20 Oct 2012 04:01:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Oct 2012 04:01:43 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Oct 20 06:01:49 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TPQFs-0003PY-S5 for guile-devel@m.gmane.org; Sat, 20 Oct 2012 06:01:45 +0200 Original-Received: from localhost ([::1]:59091 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPQFl-0000Af-KD for guile-devel@m.gmane.org; Sat, 20 Oct 2012 00:01:37 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:42104) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPQFh-0000AM-BR for guile-devel@gnu.org; Sat, 20 Oct 2012 00:01:34 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TPQFf-00043z-QY for guile-devel@gnu.org; Sat, 20 Oct 2012 00:01:33 -0400 Original-Received: from mail-yh0-f41.google.com ([209.85.213.41]:63458) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPQFf-00043v-MM for guile-devel@gnu.org; Sat, 20 Oct 2012 00:01:31 -0400 Original-Received: by mail-yh0-f41.google.com with SMTP id 47so223214yhr.0 for ; Fri, 19 Oct 2012 21:01:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mail-followup-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=/KVtTvqDeIQ8G9gmD3w21S8Fm/sB/NgyhxH30nqxlkQ=; b=cKZAJmxU0lA1yY2YWzB36w5CzgcPuAHxe0xQoX0Tpst2Vpec9xktLcIgTqCTjcY3vW nA2G0bgdL5mIlwgEucwNrRLZi/kDTBD8s9WkZkz0AbGHlNndZRhBiciHRbUf+QFSf/qe KIafLQerBymsuBGnUl3rJCAdCYXT63pWQh9R9cPGh3EaR0phfkhxnwmfW9uyi85ye9iB tevsrABIuTvaHBRHmwUYBikLqVbaq5pKMg3E8AxgdAajvr4mC3xOpmRotjE0RErH3w90 CfJbeUVab05E4yPSzc74DKbB4F/SR7hED/li+NyN6YYuCqpcs0YRz6+AEaJkN7uq5uVA ylDg== Original-Received: by 10.236.151.99 with SMTP id a63mr3016562yhk.120.1350705690578; Fri, 19 Oct 2012 21:01:30 -0700 (PDT) Original-Received: from yarrow (cpe-107-015-110-238.nc.res.rr.com. [107.15.110.238]) by mx.google.com with ESMTPS id o13sm2922696ang.1.2012.10.19.21.01.29 (version=SSLv3 cipher=OTHER); Fri, 19 Oct 2012 21:01:29 -0700 (PDT) Mail-Followup-To: guile-devel@gnu.org Content-Disposition: inline In-Reply-To: <87y5jbfj60.fsf@tines.lan> User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.213.41 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:15007 Archived-At: On Fri, Oct 12, 2012 at 05:57:11PM -0400, Mark H Weaver wrote: > FWIW, I agree with Daniel. I dislike the complicated semantics of this > 'limit' argument, which combines into a single number two different > concepts: First, I want to thank both Daniel and Mark for their feedback. I'm sorry I haven't had a chance to reply until now; last weekend I went to (and presented at) RacketCon, so I didn't have a lot of time for replying to emails. (And if you want to see my RacketCon presentation, feel free to visit https://speakerdeck.com/u/cky/p/rackona :-)) > Beyond matters of taste, I don't like this because it makes bugs less > likely to be caught. Suppose 'limit' is a computed value, normally > expected to be positive. Code that follows may implicitly assume that > the returned list has no more than 'limit' elements. Now suppose that > due to a bug or exceptional circumstance, the computed 'limit' ends up > being less than 1. Now 'regexp-split' switches to a qualitatively > different mode of behavior. I am sympathetic to this. It would definitely be good for the limit to mean only that, and not have two other meanings attached to it. So, in this spirit, below is my proposal for something that I hope would fit within the character of your feedback, while not making the common use cases needlessly verbose: we should favour the common use cases by making them easy to use. Before I begin, remember that in Perl's split, the default limit is 0, which is to strip off all the blank trailing fields. This is the common use case when using whitespace as a delimiter, where you simply want to ignore all the end-of-line whitespace. Making the calling code manually call drop-right-while is counter-productive for this common use case. Here is my proposal: (regexp-split pat str #:key limit (trim? (not limit))) With no optional arguments specified (so, #:limit is #f and #:trim? is #t), it behaves like limit == 0 in Perl. i.e., return all fields, minus blank trailing ones. With a #:limit specified (which must be a positive integer), return that number of fields at most (subsequent ones are not split out, and are returned as part of the last field, with all delimiters intact). With #:trim? given a false value, return all fields, including blank trailing ones. This is false by default iff #:limit is specified. Rationale: The common use case is the most succinct version. The next most common use case has a relatively short formulation (#:trim?). Also, the default for #:trim? is based on common use cases depending on whether #:limit is specified. (Trim-with-limit is not supported in Perl, but it seemed to take more work to ban it here than just let it be.) Examples: (regexp-split " +" "foo bar baz ") => ("foo" "bar" "baz") (regexp-split " +" "foo bar baz " #:trim? #f) => ("foo" "bar" "baz" "") (regexp-split " +" "foo bar baz " #:limit 4) => ("foo" "bar" "baz" "") (regexp-split " +" "foo bar baz " #:limit 4 #:trim? #t) => ("foo" "bar" "baz") (regexp-split " +" "foo bar baz " #:limit 3) => ("foo" "bar" "baz ") (regexp-split " +" "foo bar baz " #:limit 2) => ("foo" "bar baz ") Does that sound reasonable? Comments welcome, Chris.