From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Chris K. Jester-Young" Newsgroups: gmane.lisp.guile.devel Subject: Re: regexp-split for Guile Date: Sun, 21 Oct 2012 12:08:02 -0400 Message-ID: <20121021160802.GB25831@yarrow> References: <20120917140133.GA6315@yarrow> <87lig830ox.fsf@zigzag.favinet> <20120917200603.GB6315@yarrow> <20120918195915.GE6315@yarrow> <87y5jbfj60.fsf@tines.lan> <20121020040126.GA25831@yarrow> <87txtpjmsx.fsf@tines.lan> <87pq4djkj2.fsf@tines.lan> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1350835689 20548 80.91.229.3 (21 Oct 2012 16:08:09 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 21 Oct 2012 16:08:09 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Oct 21 18:08:18 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TPy4X-0006N5-Gp for guile-devel@m.gmane.org; Sun, 21 Oct 2012 18:08:17 +0200 Original-Received: from localhost ([::1]:38004 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPy4Q-0003k8-3a for guile-devel@m.gmane.org; Sun, 21 Oct 2012 12:08:10 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:60461) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPy4N-0003k1-II for guile-devel@gnu.org; Sun, 21 Oct 2012 12:08:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TPy4M-0004hH-Eq for guile-devel@gnu.org; Sun, 21 Oct 2012 12:08:07 -0400 Original-Received: from mail-gh0-f169.google.com ([209.85.160.169]:35804) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TPy4M-0004hD-AU for guile-devel@gnu.org; Sun, 21 Oct 2012 12:08:06 -0400 Original-Received: by mail-gh0-f169.google.com with SMTP id r1so363718ghr.0 for ; Sun, 21 Oct 2012 09:08:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mail-followup-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=KJkBtRuqlqlieQ3KILK48VDKH8xGRrNlpwoB4vqDUVA=; b=Rs8zF1ds4zccaoi2gkzoxmILeZXB2W6OkgqFFvGM7l3vWQJoxBrpd88pi/dD7VhWZS XjlnF60Ln4rVJEhBTnSpy6EI4vmUTA1nToLJTHi8yHK+8Uzfx18y1m2a6GN8UZC7YTFw LibiIz1HX58Yg+f7+oNlzLc+j4v0gcejfs99l8MSTVUtFJPulkiAVU2UJNWyTEyjUUIF MVO3Mpbvz9MYIbQiSIZCf8tg3rrHIK7V4CiEzvw1nEfyKg/NwkioORTRvcjgvPMEFUaB 5BIG2HcGQSfGBuzkyeMWmNPpazKJi4u+/qVxFUasqxCohhK5/Lv9FcGddhIBWz0s5Zvu VvDQ== Original-Received: by 10.236.82.169 with SMTP id o29mr6151344yhe.116.1350835685719; Sun, 21 Oct 2012 09:08:05 -0700 (PDT) Original-Received: from yarrow (cpe-107-015-110-238.nc.res.rr.com. [107.15.110.238]) by mx.google.com with ESMTPS id l17sm2736992ank.4.2012.10.21.09.08.04 (version=SSLv3 cipher=OTHER); Sun, 21 Oct 2012 09:08:05 -0700 (PDT) Mail-Followup-To: guile-devel@gnu.org Content-Disposition: inline In-Reply-To: <87pq4djkj2.fsf@tines.lan> User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.160.169 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:15011 Archived-At: On Sat, Oct 20, 2012 at 10:16:49AM -0400, Mark H Weaver wrote: > Sorry, that last example is wrong of course, but both of these examples > raise an interesting question about how #:limit and #:trim should > interact. To my mind, the top example above is correct. I think the > last result should be "baz", not "baz ". [...] > Honestly, this question makes me wonder if the proposed 'regexp-split' > is too complicated. If you want to trim whitespace, how about using > 'string-trim-right' or 'string-trim-both' before splitting? It seems > more likely to do what I would expect. Thanks so much for your feedback, Mark! I appreciate it. Yeah, I think given the left-to-right nature of regex matching, the only kind of trimming that makes sense is a right trim. And then once you do that, people start asking for left trim, and mayhem begins. ;-) I do want to consider the string pre-trimming approach, as it's more clear what's going on, and is less "magical" (where "magic" is a plus in the Perl world, and not so much of a plus in other languages). Thankfully, the string-trim{,-right,-both} functions you mentioned use substring behind the scenes, which uses copy-on-write. So that solves one of my potential concerns, which is that a pre-trim would require copying most of the string. * * * Granted, if you want trimming-with-complicated-regex-delimiter, and not just whitespace, then your best bet is to trim the output list. This is slightly more complicated, because my original code simply uses drop-while before reversing the output list for return, but since the caller doesn't receive the reversed list, they either have to reverse+trim+reverse (yuck), or we have to implement drop-right-while (like you mentioned previously). In that regard, here's one implementation of drop-right-while (that I just wrote on the spot): (define (drop-right-while pred lst) (let recur ((lst lst)) (if (null? lst) '() (let ((elem (car lst)) (next (recur (cdr lst)))) (if (and (null? next) (pred elem)) '() (cons elem next)))))) One could theoretically write drop-right-while! also (I can think of two different implementation strategies) but it sounds like it's more work than it's worth. So, that's our last hurdle: we "just" have to get drop-right-while integrated into Guile, then we can separate out the splitting and trimming processes. And everybody will be happy. :-) Comments welcome, Chris.