Re: regexp-split for Guile

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp-split for Guile

From:	Mark H Weaver
Subject:	Re: regexp-split for Guile
Date:	Sat, 20 Oct 2012 09:27:42 -0400
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux)

Hi Chris,

"Chris K. Jester-Young" <address@hidden> writes:
> On Fri, Oct 12, 2012 at 05:57:11PM -0400, Mark H Weaver wrote:
>> Beyond matters of taste, I don't like this because it makes bugs less
>> likely to be caught.  Suppose 'limit' is a computed value, normally
>> expected to be positive.  Code that follows may implicitly assume that
>> the returned list has no more than 'limit' elements.  Now suppose that
>> due to a bug or exceptional circumstance, the computed 'limit' ends up
>> being less than 1.  Now 'regexp-split' switches to a qualitatively
>> different mode of behavior.
>
> I am sympathetic to this. It would definitely be good for the limit to
> mean only that, and not have two other meanings attached to it.
>
> So, in this spirit, below is my proposal for something that I hope would
> fit within the character of your feedback, while not making the common
> use cases needlessly verbose: we should favour the common use cases by
> making them easy to use.
>
> Before I begin, remember that in Perl's split, the default limit is 0,
> which is to strip off all the blank trailing fields. This is the common
> use case when using whitespace as a delimiter, where you simply want to
> ignore all the end-of-line whitespace. Making the calling code manually
> call drop-right-while is counter-productive for this common use case.
>
> Here is my proposal:
>
>     (regexp-split pat str #:key limit (trim? (not limit)))
>
> With no optional arguments specified (so, #:limit is #f and #:trim? is
> #t), it behaves like limit == 0 in Perl. i.e., return all fields, minus
> blank trailing ones.
>
> With a #:limit specified (which must be a positive integer), return
> that number of fields at most (subsequent ones are not split out, and
> are returned as part of the last field, with all delimiters intact).
>
> With #:trim? given a false value, return all fields, including blank
> trailing ones. This is false by default iff #:limit is specified.
>
> Rationale: The common use case is the most succinct version. The next
> most common use case has a relatively short formulation (#:trim?).
> Also, the default for #:trim? is based on common use cases depending on
> whether #:limit is specified. (Trim-with-limit is not supported in Perl,
> but it seemed to take more work to ban it here than just let it be.)

I generally like your new proposal, but after mulling it over some more,
I think that trimming should be off by default, regardless of how limit
is set.  The thing is, it seems to me that the only time #:trim? #t
makes sense is when you're splitting based on whitespace.  In most other
cases, trimming is not a sensible default.

As a programmer, I don't want basic tools like 'regexp-split' adding a
post-processing pass on the results without me explicitly asking for it.
Furthermore, if I add (or remove) the #:limit argument, I'd be
unpleasantly surprised to see any other changes in behavior.

While it's sometimes reasonable for _user_ interfaces to try to guess
what the user wanted to enable shorter commands, programming interfaces
should not do so, IMO.  This kind of cleverness is expected in Perl
circles, but not in the Scheme world.

Also, if we're going to add a built-in trimmer to 'regexp-split', I'd
like to see a "trim both ends" mode as well.  When splitting by
whitespace, I suspect #:trim 'both is wanted as often as #:trim 'right.

So how about something like this?

    (regexp-split pat str #:key limit trim)

where (member trim (#f 'both 'right 'left))

For example:

    (regexp-split "/\\" "foo/bar\baz/")
      => ("foo" "bar" "baz" "")
    (regexp-split " +" "  foo  bar  baz  ")
      => ("" "foo" "bar" "baz" "")
    (regexp-split " +" "  foo  bar  baz  " #:trim 'right)
      => ("" "foo" "bar" "baz")
    (regexp-split " +" "  foo  bar  baz  " #:trim 'both)
      => ("foo" "bar" "baz")
    (regexp-split " +" "  foo  bar  baz  " #:limit 5)
      => ("" "foo" "bar" "baz" "")
    (regexp-split " +" "  foo  bar  baz  " #:limit 5 #:trim 'right)
      => ("" "foo" "bar" "baz")
    (regexp-split " +" "  foo  bar  baz  " #:limit 5 #:trim 'both)
      => ("foo" "bar" "baz")
    (regexp-split " +" "  foo  bar  baz  " #:limit 3 #:trim 'both)
      => ("foo" "bar" "baz")
    (regexp-split " +" "  foo  bar  baz  " #:limit 2 #:trim 'both)
      => ("foo" "bar")

What do you think?

Thanks for working on this!

     Mark

[Prev in Thread]

Current Thread

[Next in Thread]

Re: regexp-split for Guile, Ludovic Courtès, 2012/10/04
- Re: regexp-split for Guile, Daniel Hartwig, 2012/10/06
  - Re: regexp-split for Guile, Mark H Weaver, 2012/10/12
    - Re: regexp-split for Guile, Chris K. Jester-Young, 2012/10/20
    - Re: regexp-split for Guile, Mark H Weaver <=
    - Re: regexp-split for Guile, Mark H Weaver, 2012/10/20
    - Re: regexp-split for Guile, Daniel Hartwig, 2012/10/21
    - Re: regexp-split for Guile, Chris K. Jester-Young, 2012/10/21
    - Re: regexp-split for Guile, Chris K. Jester-Young, 2012/10/21

Prev by Date: Re: regexp-split for Guile
Next by Date: Re: regexp-split for Guile
Previous by thread: Re: regexp-split for Guile
Next by thread: Re: regexp-split for Guile
Index(es):
- Date
- Thread