From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: regexp-split for Guile Date: Sat, 20 Oct 2012 10:16:49 -0400 Message-ID: <87pq4djkj2.fsf@tines.lan> References: <20120917140133.GA6315@yarrow> <87lig830ox.fsf@zigzag.favinet> <20120917200603.GB6315@yarrow> <20120918195915.GE6315@yarrow> <87y5jbfj60.fsf@tines.lan> <20121020040126.GA25831@yarrow> <87txtpjmsx.fsf@tines.lan> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1350742633 14832 80.91.229.3 (20 Oct 2012 14:17:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Oct 2012 14:17:13 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Oct 20 16:17:18 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TPZrY-0002E9-Mt for guile-devel@m.gmane.org; Sat, 20 Oct 2012 16:17:16 +0200 Original-Received: from localhost ([::1]:57835 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPZrR-0002y1-1w for guile-devel@m.gmane.org; Sat, 20 Oct 2012 10:17:09 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:44963) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPZrO-0002xt-CC for guile-devel@gnu.org; Sat, 20 Oct 2012 10:17:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TPZrN-0006ZU-De for guile-devel@gnu.org; Sat, 20 Oct 2012 10:17:06 -0400 Original-Received: from world.peace.net ([96.39.62.75]:35217) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPZrN-0006ZG-9M for guile-devel@gnu.org; Sat, 20 Oct 2012 10:17:05 -0400 Original-Received: from c-98-217-64-74.hsd1.ma.comcast.net ([98.217.64.74] helo=tines.lan) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1TPZrF-0001eR-EI; Sat, 20 Oct 2012 10:16:57 -0400 In-Reply-To: <87txtpjmsx.fsf@tines.lan> (Mark H. Weaver's message of "Sat, 20 Oct 2012 09:27:42 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:15009 Archived-At: I wrote: > (regexp-split " +" " foo bar baz " #:limit 3 #:trim 'both) > => ("foo" "bar" "baz") > (regexp-split " +" " foo bar baz " #:limit 2 #:trim 'both) > => ("foo" "bar") Sorry, that last example is wrong of course, but both of these examples raise an interesting question about how #:limit and #:trim should interact. To my mind, the top example above is correct. I think the last result should be "baz", not "baz ". I guess I'd prefer to think of #:trim as trimming *before* splitting, instead of trimming empty elements *after* splitting, so: (regexp-split " +" " foo bar baz " #:limit 3 #:trim 'both) => ("foo" "bar" "baz") (regexp-split " +" " foo bar baz " #:limit 2 #:trim 'both) => ("foo" "bar baz") Note also that if you trim empty elements *after* splitting, then there's a bad interaction with #:limit if you trim the left side. Consider: (regexp-split " +" " foo bar baz " #:limit 3 #:trim 'both) If we first split, taking into account the limit, we get: ("" "foo" "bar baz ") and then we trim empty elements from both ends to get the final result: => ("foo" "bar baz") which seems wrong, given that I asked for #:limit 3. Honestly, this question makes me wonder if the proposed 'regexp-split' is too complicated. If you want to trim whitespace, how about using 'string-trim-right' or 'string-trim-both' before splitting? It seems more likely to do what I would expect. What do you think? Regards, Mark