Re: regexp-split for Guile - Chris K. Jester-Young

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: "Chris K. Jester-Young" <cky944@gmail.com>
To: guile-devel@gnu.org
Subject: Re: regexp-split for Guile
Date: Sat, 20 Oct 2012 00:01:26 -0400	[thread overview]
Message-ID: <20121020040126.GA25831@yarrow> (raw)
In-Reply-To: <87y5jbfj60.fsf@tines.lan>

On Fri, Oct 12, 2012 at 05:57:11PM -0400, Mark H Weaver wrote:
> FWIW, I agree with Daniel.  I dislike the complicated semantics of this
> 'limit' argument, which combines into a single number two different
> concepts:

First, I want to thank both Daniel and Mark for their feedback. I'm
sorry I haven't had a chance to reply until now; last weekend I went
to (and presented at) RacketCon, so I didn't have a lot of time for
replying to emails.

(And if you want to see my RacketCon presentation, feel free to visit
https://speakerdeck.com/u/cky/p/rackona :-))

> Beyond matters of taste, I don't like this because it makes bugs less
> likely to be caught.  Suppose 'limit' is a computed value, normally
> expected to be positive.  Code that follows may implicitly assume that
> the returned list has no more than 'limit' elements.  Now suppose that
> due to a bug or exceptional circumstance, the computed 'limit' ends up
> being less than 1.  Now 'regexp-split' switches to a qualitatively
> different mode of behavior.

I am sympathetic to this. It would definitely be good for the limit to
mean only that, and not have two other meanings attached to it.

So, in this spirit, below is my proposal for something that I hope would
fit within the character of your feedback, while not making the common
use cases needlessly verbose: we should favour the common use cases by
making them easy to use.

Before I begin, remember that in Perl's split, the default limit is 0,
which is to strip off all the blank trailing fields. This is the common
use case when using whitespace as a delimiter, where you simply want to
ignore all the end-of-line whitespace. Making the calling code manually
call drop-right-while is counter-productive for this common use case.

Here is my proposal:

    (regexp-split pat str #:key limit (trim? (not limit)))

With no optional arguments specified (so, #:limit is #f and #:trim? is
#t), it behaves like limit == 0 in Perl. i.e., return all fields, minus
blank trailing ones.

With a #:limit specified (which must be a positive integer), return
that number of fields at most (subsequent ones are not split out, and
are returned as part of the last field, with all delimiters intact).

With #:trim? given a false value, return all fields, including blank
trailing ones. This is false by default iff #:limit is specified.

Rationale: The common use case is the most succinct version. The next
most common use case has a relatively short formulation (#:trim?).
Also, the default for #:trim? is based on common use cases depending on
whether #:limit is specified. (Trim-with-limit is not supported in Perl,
but it seemed to take more work to ban it here than just let it be.)

Examples:

    (regexp-split " +" "foo  bar  baz  ")
      => ("foo" "bar" "baz")
    (regexp-split " +" "foo  bar  baz  " #:trim? #f)
      => ("foo" "bar" "baz" "")
    (regexp-split " +" "foo  bar  baz  " #:limit 4)
      => ("foo" "bar" "baz" "")
    (regexp-split " +" "foo  bar  baz  " #:limit 4 #:trim? #t)
      => ("foo" "bar" "baz")
    (regexp-split " +" "foo  bar  baz  " #:limit 3)
      => ("foo" "bar" "baz  ")
    (regexp-split " +" "foo  bar  baz  " #:limit 2)
      => ("foo" "bar  baz  ")

Does that sound reasonable?

Comments welcome,
Chris.

next prev parent reply	other threads:[~2012-10-20  4:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-17 14:01 regexp-split for Guile Chris K. Jester-Young
2012-09-17 19:32 ` Thien-Thi Nguyen
2012-09-17 20:06   ` Chris K. Jester-Young
2012-09-18  7:06     ` Sjoerd van Leent Privé
2012-09-18 19:31       ` Chris K. Jester-Young
2012-09-18 19:59     ` Chris K. Jester-Young
2012-10-07  2:38       ` Daniel Hartwig
2012-10-12 21:57         ` Mark H Weaver
2012-10-20  4:01           ` Chris K. Jester-Young [this message]
2012-10-20 13:27             ` Mark H Weaver
2012-10-20 14:16               ` Mark H Weaver
2012-10-21  8:20                 ` Daniel Hartwig
2012-10-21 19:23                   ` Chris K. Jester-Young
2012-10-21 16:08                 ` Chris K. Jester-Young
2012-09-18 12:59 ` nalaginrut
2012-09-18 19:55   ` Chris K. Jester-Young
2012-09-19  0:30     ` nalaginrut
2012-10-04 21:47 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121020040126.GA25831@yarrow \
    --to=cky944@gmail.com \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).