Rationale for split-string?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Rationale for split-string?
@ 2003-04-17  9:06 Stephen J. Turnbull
  2003-04-17 11:30 ` Stefan Reichör
  2003-04-17 17:44 ` Stefan Monnier
  0 siblings, 2 replies; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-04-17  9:06 UTC (permalink / raw)
  Cc: xemacs-design

What is the rationale for the specification of `split-string'?

That is, in GNU Emacs

  ;; an often convenient abbreviation
  (split-string "  data  ")
=> ("data")

  ;; weird
  (split-string "  data  " " ")
=> ("" "data" "")

  ;; urk (think "gnumeric just-say-no.xls" "save as" "csv")
  (split-string ",,data,," ",")
=> ("" "data" "")

emacs-version
"21.2.2"

In XEmacs currently we get

  ;; usually (delete "" (split-string "  data  ")) should do the
  ;; trick if you don't like this
  (split-string "  data  ")
=> ("" "data" "")

  ;; no less useful than what GNU Emacs returns
  (split-string "  data  " " ")
=> ("" "" "data" "" "")

  ;; I can't imagine wanting anything else
  (split-string ",,data,," ",")
=> ("" "" "data" "" "")

For comparison, Python's `split' function behaves like XEmacs's
`split-string'.  Perl's `split' function by default removes all trailing
null fields while preserving all leading null fields, but when invoked
"split (/pattern/, string, -1)" behaves like XEmacs's `split-string'.

I think it makes sense for GNU Emacs to adopt (return to?) the
simpler, more consistent behavior, rather than have XEmacs sync to GNU
Emacs.  In particular, I think it's really unfortunate to force people
who want to parse csv data and the like to write their own functions,
while the `(delete "" (split-string ...))' idiom not only seems very
natural to me, but it handles the second example better than GNU Emacs
currently does.  And while I'm sure there exist applications where
trimming null fields at the ends but leaving them when surrounded by
non-null ones make sense, I can't come up with one offhand.  I suspect
they're less common than either "remove all nulls" or "keep all nulls".

I believe that (at least for third-party maintainers) this change
should cause no problems, because we have had no complaints about the
behavior from anyone.  (We discovered the difference only when Ben
started a sync, and the regression test sent up flares and alarums.)

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-17  9:06 Rationale for split-string? Stephen J. Turnbull
@ 2003-04-17 11:30 ` Stefan Reichör
  2003-04-18  1:54   ` Richard Stallman
  2003-04-17 17:44 ` Stefan Monnier
  1 sibling, 1 reply; 35+ messages in thread
From: Stefan Reichör @ 2003-04-17 11:30 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

On Thu, 17 Apr 2003, Stephen J. Turnbull said:

>  What is the rationale for the specification of `split-string'?
>  
>  That is, in GNU Emacs
>  
>    ;; an often convenient abbreviation
>    (split-string "  data  ")
>  => ("data")
>  
>    ;; weird
>    (split-string "  data  " " ")
>  => ("" "data" "")
>  
>    ;; urk (think "gnumeric just-say-no.xls" "save as" "csv")
>    (split-string ",,data,," ",")
>  => ("" "data" "")
>  
>  emacs-version
>  "21.2.2"
>  
>  In XEmacs currently we get
>  
>    ;; usually (delete "" (split-string "  data  ")) should do the
>    ;; trick if you don't like this
>    (split-string "  data  ")
>  => ("" "data" "")
>  
>    ;; no less useful than what GNU Emacs returns
>    (split-string "  data  " " ")
>  => ("" "" "data" "" "")
>  
>    ;; I can't imagine wanting anything else
>    (split-string ",,data,," ",")
>  => ("" "" "data" "" "")
>  
>  For comparison, Python's `split' function behaves like XEmacs's
>  `split-string'. Perl's `split' function by default removes all
>  trailing null fields while preserving all leading null fields, but
>  when invoked "split (/pattern/, string, -1)" behaves like XEmacs's
>  `split-string'.
>  
>  I think it makes sense for GNU Emacs to adopt (return to?) the
>  simpler, more consistent behavior, rather than have XEmacs sync to
>  GNU Emacs. In particular, I think it's really unfortunate to force
>  people who want to parse csv data and the like to write their own
>  functions, while the `(delete "" (split-string ...))' idiom not
>  only seems very natural to me, but it handles the second example
>  better than GNU Emacs currently does. And while I'm sure there
>  exist applications where trimming null fields at the ends but
>  leaving them when surrounded by non-null ones make sense, I can't
>  come up with one offhand. I suspect they're less common than either
>  "remove all nulls" or "keep all nulls".
>  
>  I believe that (at least for third-party maintainers) this change
>  should cause no problems, because we have had no complaints about
>  the behavior from anyone. (We discovered the difference only when
>  Ben started a sync, and the regression test sent up flares and
>  alarums.)

I noticed the different behavior of the split-string function,
because I need to parse csv output from subversion. Now I need
different code for the two platforms.

I would welcome, if the GNU Emacs and XEmacs would have the same
split-string implementation.

Stefan.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-17 11:30 ` Stefan Reichör
@ 2003-04-18  1:54   ` Richard Stallman
  2003-04-18  2:59     ` Steve Youngs
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2003-04-18  1:54 UTC (permalink / raw)
  Cc: emacs-devel

    I would welcome, if the GNU Emacs and XEmacs would have the same
    split-string implementation.

I know of no reason to want them to be different.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-18  1:54   ` Richard Stallman
@ 2003-04-18  2:59     ` Steve Youngs
  0 siblings, 0 replies; 35+ messages in thread
From: Steve Youngs @ 2003-04-18  2:59 UTC (permalink / raw)
  Cc: Emacs Devel

|--==> "RS" == Richard Stallman <rms@gnu.org> writes:

  RS> I know of no reason to want them to be different.

Fantastic!  Steve Turnbull is a thorough guy, so I'm sure that he will
send you a patch so you can fix GNU/Emacs' split-string.

-- 
|---<Steve Youngs>---------------<GnuPG KeyID: 10D5C9C5>---|
|        XEmacs - The only _______ you'll ever need.       |
|          Fill in the blank, yes, it's THAT good!         |
|------------------------------------<youngs@xemacs.org>---|

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-17  9:06 Rationale for split-string? Stephen J. Turnbull
  2003-04-17 11:30 ` Stefan Reichör
@ 2003-04-17 17:44 ` Stefan Monnier
  2003-04-17 19:32   ` Luc Teirlinck
                     ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: Stefan Monnier @ 2003-04-17 17:44 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

> What is the rationale for the specification of `split-string'?
> 
> That is, in GNU Emacs
> 
>   ;; an often convenient abbreviation
>   (split-string "  data  ")
> => ("data")
> 
>   ;; weird
>   (split-string "  data  " " ")
> => ("" "data" "")
> 
>   ;; urk (think "gnumeric just-say-no.xls" "save as" "csv")
>   (split-string ",,data,," ",")
> => ("" "data" "")

I think the reason is for the default case.  In XEmacs we get:

ELISP> (split-string "  a  b  ")
("" "a" "b" "")

What is usually desired here is to eliminate all empty parts.
The `+' in the default regexp gets rid of the empty parts inside
the string, but not at the beginning and at the end, so that's
why Emacs gets rid of the empty string at the beginning and at
the end.

I agree that when the regexp used is "," or "[ \t]*,[ \t]*",
then XEmacs's behavior makes a lot more sense.

A gross hack is to test if the last char of the regexp is ?+
and if so get rid of empty strings at start and end.
It should take care of 99% of the cases.

	Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-17 17:44 ` Stefan Monnier
@ 2003-04-17 19:32   ` Luc Teirlinck
  2003-04-18 11:50   ` Stephen J. Turnbull
  2003-04-19  4:14   ` Richard Stallman
  2 siblings, 0 replies; 35+ messages in thread
From: Luc Teirlinck @ 2003-04-17 19:32 UTC (permalink / raw)
  Cc: stephen, emacs-devel, xemacs-design

Stefan Monnier wrote:

   A gross hack is to test if the last char of the regexp is ?+
   and if so get rid of empty strings at start and end.
   It should take care of 99% of the cases.

If you can not decide which of the two types of behavior is more
useful, would it not be more logical to have the behavior depend on
some optional new argument with the old behavior the default, so that
no existing code gets broken?  Gross hacks that "should" take care of
99% of the cases usually turn out to take care of something that looks
more like 66% or even 50%.  Making the behavior depend on the last
character of the regexp just looks like a very messy imprecise
heuristic.

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-17 17:44 ` Stefan Monnier
  2003-04-17 19:32   ` Luc Teirlinck
@ 2003-04-18 11:50   ` Stephen J. Turnbull
  2003-04-18 14:17     ` Stefan Monnier
  2003-04-19 13:35     ` Richard Stallman
  2003-04-19  4:14   ` Richard Stallman
  2 siblings, 2 replies; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-04-18 11:50 UTC (permalink / raw)
  Cc: Stephen J. Turnbull, emacs-devel, xemacs-design

>>>>> "Stefan" == Stefan Monnier <monnier+gnu/emacs@rum.cs.yale.edu> writes:

    >> What is the rationale for the specification of `split-string'?

    Stefan> I think the reason is for the default case.  In XEmacs we
    Stefan> get:

ELISP> (split-string "  a  b  ")
("" "a" "b" "")

    Stefan> What is usually desired here is to eliminate all empty
    Stefan> parts.

I tend to agree, but remember Larry Wall does not.  That concerns me;
Larry is nothing if not remarkably good at intuiting what works.  And
the (delete "" (split-string ...))  idiom is hardly an exercise in
perversion or a brainteaser.

    Stefan> A gross hack is to test if the last char of the regexp is
    Stefan> ?+ and if so get rid of empty strings at start and end.
    Stefan> It should take care of 99% of the cases.

That's an implementation, not a specification.  Using that means we'll
be having this discussion again, sooner or later.  Think about someone
who writes a smart SEPARATORS to get rid of whitespace or leaders
around the elements.  I really don't like the idea of iterating a spec
every time somebody finds a plausible use for the function that some
"less gross than the last time hack" rules out.  If you want a
specific common case optimized, test for that.

Eg, how about one of

(defun split-string-sanely (string &optional separators)
  (cond ((eq separators t) (gnu-emacs-split-string string))
        (t (xemacs-split-string string separators))))

(defun split-string-sanely-too (string &optional separators)
  (let ((result (xemacs-split-string string separators)))
    (cond ((stringp separators)        result)
          ((eq separators 'omit-nulls) (delete "" result))
          (t (error 'invalid-argument
                    "SEPARATORS must be a string or 'omit-nulls"
                    separators)))))

(defun split-string-flexibly (string &optional separators thunk)
  (let ((result (xemacs-split-string string separators)))
    (cond ((functionp thunk)      (delete-if thunk result))
          ((eq thunk 'omit-nulls) (delete "" result))
          ((null thunk)           result)
          (t (error 'invalid-argument
                    "THUNK must be nil, 'omit-nulls, or a function"
                    thunk)))))

These can be easily generalized to further useful special cases
(deleting blank strings or non-numbers, anyone?) without ever screwing
up old code or ruling out uses of a given SEPARATORS regexp.

In fact, my preference would be to implement and name more or less as
above, in which case I would default differently (e.g., if SEPARATORS
is nil, use the omit-nulls behavior).  Then the internal function
could be named `split-string' and have the simple, consistent
behavior.  Both APIs would be considered public.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-18 11:50   ` Stephen J. Turnbull
@ 2003-04-18 14:17     ` Stefan Monnier
  2003-04-19  8:18       ` Stephen J. Turnbull
  2003-04-19 13:35     ` Richard Stallman
  1 sibling, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2003-04-18 14:17 UTC (permalink / raw)
  Cc: Stefan Monnier

> I tend to agree, but remember Larry Wall does not.  That concerns me;
> Larry is nothing if not remarkably good at intuiting what works.  And
> the (delete "" (split-string ...))  idiom is hardly an exercise in
> perversion or a brainteaser.

I don't think it has much to do with intuition.
He just had in mind splitting entries in /etc/passwd or tab-separated
fields or somesuch whereas Emacs coders wanted the function to extract
a list of words out of a string.
As I said, the XEmacs behavior is more regular and probably preferable.

>     Stefan> A gross hack is to test if the last char of the regexp is
>     Stefan> ?+ and if so get rid of empty strings at start and end.
>     Stefan> It should take care of 99% of the cases.
> 
> That's an implementation, not a specification.  Using that means we'll
> be having this discussion again, sooner or later.  Think about someone

Why do people assume that I'd want gross hacks in Emacs's code ?


	Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-18 14:17     ` Stefan Monnier
@ 2003-04-19  8:18       ` Stephen J. Turnbull
  0 siblings, 0 replies; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-04-19  8:18 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Stefan" == Stefan Monnier <monnier+gnu/emacs@rum.cs.yale.edu> writes:

    Stefan> As I said, the XEmacs behavior is more regular and
    Stefan> probably preferable.

Good.

How about the convenience function aspect?  Do you agree that keying
on one or more symbols for less regular, but useful, behavior is a
reasonable interface?  I would prefer to _not_ overload
`split-string', but have a second function.  I'm not wedded to that,
though.

    Stefan> Why do people assume that I'd want gross hacks in Emacs's
    Stefan> code ?

It didn't look funny, and I've fallen into the habit of taking what
you say seriously.  Should I break that habit?  :-)

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-18 11:50   ` Stephen J. Turnbull
  2003-04-18 14:17     ` Stefan Monnier
@ 2003-04-19 13:35     ` Richard Stallman
  1 sibling, 0 replies; 35+ messages in thread
From: Richard Stallman @ 2003-04-19 13:35 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, emacs-devel, xemacs-design, stephen

    (defun split-string-sanely-too (string &optional separators)
      (let ((result (xemacs-split-string string separators)))
	(cond ((stringp separators)        result)
	      ((eq separators 'omit-nulls) (delete "" result))
	      (t (error 'invalid-argument
			"SEPARATORS must be a string or 'omit-nulls"
			separators)))))

This seems like a good approach, but I would rather use t instead
of `omit-nulls'.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-17 17:44 ` Stefan Monnier
  2003-04-17 19:32   ` Luc Teirlinck
  2003-04-18 11:50   ` Stephen J. Turnbull
@ 2003-04-19  4:14   ` Richard Stallman
  2003-04-19  8:55     ` Stephen J. Turnbull
  2 siblings, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2003-04-19  4:14 UTC (permalink / raw)
  Cc: emacs-devel

      RS> I know of no reason to want them to be different.

    Fantastic!  Steve Turnbull is a thorough guy, so I'm sure that he will
    send you a patch so you can fix GNU/Emacs' split-string.

First we need to figure out what is the right behavior for that
function.  People are already discussing the question...

    >   (split-string ",,data,," ",")
    > => ("" "data" "")

Is that wrong?  If so, what result do you think is right?
("" "" "data" "" "") could be argued for, but I am not sure
it is better.

    A gross hack is to test if the last char of the regexp is ?+
    and if so get rid of empty strings at start and end.
    It should take care of 99% of the cases.

That is a kludge.  Whatever we do, it should not be that.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-19  4:14   ` Richard Stallman
@ 2003-04-19  8:55     ` Stephen J. Turnbull
  2003-04-21  0:59       ` Richard Stallman
  0 siblings, 1 reply; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-04-19  8:55 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

    >   (split-string ",,data,," ",")
    > => ("" "data" "")

    rms> Is that wrong?  If so, what result do you think is right?
    rms> ("" "" "data" "" "") could be argued for, but I am not sure
    rms> it is better.

Well, if you are parsing a comma separated value file (the standard
text/plain output format for spreadsheets and some databases, such as
subversion), the five-element list is exactly what you want, and the
three-element list is a type error (incomplete record).  In what case
would the three-element list be desirable?  I understand the case for
a one-element result, but not three.

I see basically two modes.  In one mode you are parsing fields from
each of a sequence of records, in which case you want to retain null
strings as null values.  In the other, you are parsing a (free-form)
stream of words, in which case null words (usually) don't exist, so
you want to throw away _all_ of the null strings.  In fact, all of the
whitespace-only strings, too, but those normally won't arise in the
common case where SEPARATORS matches contiguous whitespace.

I think we should support both modes, but the token-parser is easy to
derive from the field-parser, while it's impossible to do the reverse
because the token parser throws away information.  I conclude that the
field-parser (the XEmacs behavior) is more primitive, and I'd like to
call that `split-string', with either more sophisticated behavior
implemented by overloading the separators argument to take keywords
for special treatment, or (preferably) in a separate function.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-19  8:55     ` Stephen J. Turnbull
@ 2003-04-21  0:59       ` Richard Stallman
  2003-04-21  1:55         ` Luc Teirlinck
  2003-04-21 10:58         ` Stephen J. Turnbull
  0 siblings, 2 replies; 35+ messages in thread
From: Richard Stallman @ 2003-04-21  0:59 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

    I see basically two modes.  In one mode you are parsing fields from
    each of a sequence of records, in which case you want to retain null
    strings as null values.  In the other, you are parsing a (free-form)
    stream of words, in which case null words (usually) don't exist, so
    you want to throw away _all_ of the null strings.  In fact, all of the
    whitespace-only strings, too, but those normally won't arise in the
    common case where SEPARATORS matches contiguous whitespace.

I think that makes sense.  Does anyone see a counterargument,
or a reason why any other behavior is useful?

    I think we should support both modes, but the token-parser is easy to
    derive from the field-parser, while it's impossible to do the reverse
    because the token parser throws away information.  I conclude that the
    field-parser (the XEmacs behavior) is more primitive, and I'd like to
    call that `split-string', 

I don't entirely agree.  The default case uses strings of whitespace
as the separator, and for that case, the only intelligent approach is
token-parsing.  So the function needs to be able to do token-parsing.
This feature therefore may as well also be available for any separator.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-21  0:59       ` Richard Stallman
@ 2003-04-21  1:55         ` Luc Teirlinck
  2003-04-21 10:58         ` Stephen J. Turnbull
  1 sibling, 0 replies; 35+ messages in thread
From: Luc Teirlinck @ 2003-04-21  1:55 UTC (permalink / raw)
  Cc: stephen, emacs-devel, xemacs-design

I am personally always weary of changing existing documented behavior.
Sometimes it is necessary, but even then it is a necessary evil.
There is, of course, the possibility of breaking existing code.  We do
not know "all existing code", so just grepping through stuff does not
solve the problem.  It also makes life hard on people trying to write
packages that are portable between Emacs versions.

My own suggestion would be to add a new optional argument, say
delete-null-matches to split-string.  The value could be "all", "none"
or "edges" and maybe even "beginning" and "end" (but that would be a
luxury).  For Emacs a value of nil would be equivalent with "edges"
(the current behavior), for XEmacs it would be equivalent with "none",
XEmacs' current behavior.  No existing Emacs or XEmacs code would get
broken, and people worried about Emacs-XEmacs compatibility could
always give an explicit non-nil value, which would be interpreted in
exactly the same way by Emacs and XEmacs.

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-21  0:59       ` Richard Stallman
  2003-04-21  1:55         ` Luc Teirlinck
@ 2003-04-21 10:58         ` Stephen J. Turnbull
  2003-04-21 21:11           ` Luc Teirlinck
  2003-04-23  1:00           ` Richard Stallman
  1 sibling, 2 replies; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-04-21 10:58 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

>>>>> "rms" == Richard Stallman <rms@gnu.org> writes:

    rms> I don't entirely agree.  The default case uses strings of
    rms> whitespace as the separator, and for that case, the only
    rms> intelligent approach is token-parsing.  So the function needs
    rms> to be able to do token-parsing.

I was afraid of that (I prefer regular behavior over intelligent
behavior if I must make a choice), but I can live with it.  I really
would prefer a separate `tokenize-string' function, though.  (That
name is not used in the GNU Emacs or XEmacs cores, or anywhere in the
XEmacs packages.  Several packages have their own tokenize functions
but they're all properly prefixed, and one might fear semantic would
use the name, but it doesn't.)

    rms> This feature therefore may as well also be available for any
    rms> separator.

But that's not compatible with a *single* function with with *two*
arguments.  So I suppose you want a simpler version of Luc Teirlinck's
suggestion.  How about:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; one function, three arguments

(defun split-string (string &optional separators omit-nulls)

  "Splits STRING into substrings bounded by matches for SEPARATORS.

The beginning and end of STRING, and each match for SEPARATORS, are
splitting points.  The substrings between the splitting points are
collected in a list, which is returned.  (The substrings matching
SEPARATORS are removed.)

If SEPARATORS is nil, it defaults to \"[ \f\t\n\r\v]+\".

If OMIT-NULLs is t, zero-length substrings are omitted from the list
(so that for the default value of SEPARATORS leading and trailing
whitespace are trimmed).  If nil, all zero-length substrings are
retained, which correctly parses CSV format, for example."

  ;; implementation
  )

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-21 10:58         ` Stephen J. Turnbull
@ 2003-04-21 21:11           ` Luc Teirlinck
  2003-04-21 23:43             ` Miles Bader
  2003-04-23  1:00           ` Richard Stallman
  1 sibling, 1 reply; 35+ messages in thread
From: Luc Teirlinck @ 2003-04-21 21:11 UTC (permalink / raw)
  Cc: emacs-devel

Stephen Turnbull wrote:

   How about:

   ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
   ;;
   ;; one function, three arguments

   (defun split-string (string &optional separators omit-nulls)

     "Splits STRING into substrings bounded by matches for SEPARATORS.

   The beginning and end of STRING, and each match for SEPARATORS, are
   splitting points.  The substrings between the splitting points are
   collected in a list, which is returned.  (The substrings matching
   SEPARATORS are removed.)

   If SEPARATORS is nil, it defaults to \"[ \f\t\n\r\v]+\".

   If OMIT-NULLs is t, zero-length substrings are omitted from the list
   (so that for the default value of SEPARATORS leading and trailing
   whitespace are trimmed).  If nil, all zero-length substrings are
   retained, which correctly parses CSV format, for example."

     ;; implementation
     )

There are two problems with this.  First of, all it would break tons
of existing Emacs code.  Secondly, the defaults for SEPARATORS and for
OMIT-NULLs do not match.  Thus, the most routine call of 
(split-string string) would produce nonsensical results in the case of
leading or trailing whitespace.

Something like

(split-string &optional separators keep-nulls)

that is, the same as your proposal but with the roles of nil and t
reversed would take care of the second objection and also break less
existing Emacs code (but probably still enough to worry about).  Of
course the reduction in broken Emacs code would probably come at the
expense of breaking existing XEmacs code.

With your proposal, we would have to replace plenty of occurrence of
(split-string string) in Emacs with (split-string string nil t).  To
do that automatically, we would have to change all of them.  There is
plenty of Elisp code that is not included in either the Emacs or
XEmacs distributions, but that might still be important to plenty of
people.  We can not change that code.  Code compatible between
different Emacs versions would have to become more complex.  The
reverse version of your proposal would eliminate this part of the
problem, but probably produce a similar problem for XEmacs.  With the
reverse proposal above, we would not have to worry about Emacs calls
to split-string with the default-value for SEPARATORS, but one still
would have to go through all occurrences of split-string with
non-default values of SEPARATORS, at the very least in all .el files
in the Lisp directory and all its subdirectories, and very carefully
check which ones the change would break and fix all those.
(Personally I do not have the time to do that.)  Even if somebody
finds the time to do all of this, we can not check and fix Elisp code
not included in the Emacs or XEmacs distributions.

The point of my proposal (possible values "all","none" and "edges" for
omit-nulls with nil being equivalent with "edges" in Emacs and with
"none" in XEmacs) was to avoid breaking any existing Emacs or XEmacs
code while still making it trivial to use split-string in a way that
works identically in Emacs and XEmacs.  Again, in that proposal, only
"edges" as an additional value for omit-nulls is necessary to avoid
breaking existing Emacs code.  I only mentioned "beginning" and "end"
as luxury possibilities.  I know of software packages that use the
"end" version and the "end" version actually does make a lot of sense
in plenty of situations, like splitting a file or buffer into lines,
where a leading newline does represent an empty line, but a trailing
one does not represent an additional empty line following it.  The
"end" (as well as the "beginning") behavior is, however, trivial to
obtain from the "none" behavior, so that it would be a luxury.  ("end"
would be a nice luxury, "beginning" would probably be a "luxury
luxury" for symmetry with "end".)

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-21 21:11           ` Luc Teirlinck
@ 2003-04-21 23:43             ` Miles Bader
  2003-04-22  3:26               ` Luc Teirlinck
  0 siblings, 1 reply; 35+ messages in thread
From: Miles Bader @ 2003-04-21 23:43 UTC (permalink / raw)
  Cc: stephen, emacs-devel, xemacs-design, rms

On Mon, Apr 21, 2003 at 04:11:21PM -0500, Luc Teirlinck wrote:
>    (defun split-string (string &optional separators omit-nulls)
>
> There are two problems with this.  First of, all it would break tons
> of existing Emacs code.  Secondly, the defaults for SEPARATORS and for
> OMIT-NULLs do not match.  Thus, the most routine call of 
> (split-string string) would produce nonsensical results in the case of
> leading or trailing whitespace.

Other than the all-defaults case (where _both_ optional arguments are
omitted), I think Stephen's formulation is very natural, in that you usually
want OMIT-NULLS to be t if you're splitting on a non-whitespace string.

I think the problem with the all-defaults case could be solved by having
OMIT-NULLS default to t when SEPARATORS is not specified.  This is what awk
does I think (with split), and it's really very natural.

[IOW, at the beginning of the function, put:
  (unless separators (setq omit-nulls t))
]

-Miles
-- 
We are all lying in the gutter, but some of us are looking at the stars.
-Oscar Wilde



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-21 23:43             ` Miles Bader
@ 2003-04-22  3:26               ` Luc Teirlinck
  2003-04-22  4:09                 ` Jerry James
  2003-04-22 13:19                 ` Stephen J. Turnbull
  0 siblings, 2 replies; 35+ messages in thread
From: Luc Teirlinck @ 2003-04-22  3:26 UTC (permalink / raw)
  Cc: stephen, emacs-devel, xemacs-design, rms

Miles Bader wrote:

   I think Stephen's formulation is very natural, in that you usually
   want OMIT-NULLS to be t if you're splitting on a non-whitespace
   string.

First of all, I am not worried about Stephen's formulation being
unnatural (although the original formulation actually would produce
unnatural results in the default case), but about it breaking existing
code.

I believe you are underestimating the level of generality of
split-string and the wild heterogeneity of its applications.  It is by
no means whatsoever true that except in the whitespace case you would
want to keep all null matches.  If SEPARATORS is a "terminator
character", say newline, then a null match at the beginning counts.
There is no reason you would start the string with a terminator other
than to explicitly terminate an empty string.  The empty match at the
end does not count, because the terminator at that place just
terminates the previous match.  This is, for instance, how you would
want to split a buffer, or a file, or user input, into lines.  The way
you implement that with the current split-string is to first check for
an initial terminator and, if there is one, prepend an empty string to
the split-string output.  With the proposed new split-string, you
delete the empty match at the end from the split-string output.  That
is actually easier.  However...

The "however" is that we are not defining a *new* function but
*re*defining an *existing* function, an often used and extremely
general existing function.  That is all but guaranteed to produce a
wild variety of bugs.

In fact let us assume, for the sake of argument, that Stephen and you
are 100% right.  That would mean that any correct existing code, using
the present Emacs split-string with a non-nil SEPARATORS, checks for
empty matches at the beginning and end and adds any such matches to
the split-string output to correct the "bug" in the present
split-string.  After Stephen's change, any empty match at the
beginning and end of the string will produce not one, but two empty
strings.

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22  3:26               ` Luc Teirlinck
@ 2003-04-22  4:09                 ` Jerry James
  2003-04-22  8:15                   ` Eli Zaretskii
  2003-04-22 12:56                   ` Luc Teirlinck
  2003-04-22 13:19                 ` Stephen J. Turnbull
  1 sibling, 2 replies; 35+ messages in thread
From: Jerry James @ 2003-04-22  4:09 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

Luc Teirlinck <teirllm@dms.auburn.edu> wrote:
> First of all, I am not worried about Stephen's formulation being
> unnatural (although the original formulation actually would produce
> unnatural results in the default case), but about it breaking existing
> code.

[snip]

> The "however" is that we are not defining a *new* function but
> *re*defining an *existing* function, an often used and extremely
> general existing function.  That is all but guaranteed to produce a
> wild variety of bugs.

Speaking of existing code, it's worth making a couple more points.  It
appears to me that Emacs 21.1 contained a version with the same behavior
as XEmacs'; that is, it produced empty strings at the beginning and end
in the cases of interest.  Emacs 21.4 contained the current version,
that discards such empty strings.  So did anybody on the Emacs team
worry about breaking existing code when 21.4 was released, nearly 4
years ago?  If so, what steps were taken to counter such breakage?  Did
"a wild variety of bugs" appear at the time?  Are there any mail
archives of emacs-devel available from back then?

Furthermore, how much code will just work, whether the empty strings are
present or not?  After all, Emacs' current implementation can still
produce results containing empty strings, and doesn't even live up to
its docstring's promise of not having any at the beginning or end, as
some of Stephen's examples show, so any split-string clients still have
to deal with such strings.  How much code uses the delete idiom to throw
the empty strings away?  That code wouldn't notice the change.

I did a lot of digging through the XEmacs package code a little while
ago while researching this issue.  I didn't see any code that
conditionalized on the version of split-string (although I did not make
a complete tour, either), so I suspect that a lot of code still assumes
the semantics of the old version, and just never noticed that some empty
strings don't appear any more.

In short, is there any reason to believe that it wouldn't break LESS
code to revert to the old version and pretend that the last 4 years
never happened?
-- 
Jerry James
http://www.ittc.ku.edu/~james/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22  4:09                 ` Jerry James
@ 2003-04-22  8:15                   ` Eli Zaretskii
  2003-04-22 13:22                     ` Stephen J. Turnbull
  2003-04-22 12:56                   ` Luc Teirlinck
  1 sibling, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2003-04-22  8:15 UTC (permalink / raw)
  Cc: emacs-devel

> From: Jerry James <james@xemacs.org>
> Date: 21 Apr 2003 23:09:31 -0500
> 
> It
> appears to me that Emacs 21.1 contained a version with the same behavior
> as XEmacs'; that is, it produced empty strings at the beginning and end
> in the cases of interest.  Emacs 21.4 contained the current version,
> that discards such empty strings.  So did anybody on the Emacs team
> worry about breaking existing code when 21.4 was released, nearly 4
> years ago?

There's some confusion (or maybe typos) here: Emacs 21.4 is not
released yet, certainly not 4 years ago.  The latest Emacs version is
21.3, released about 2 weeks ago.  Perhaps you got the versions wrong
or something.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22  8:15                   ` Eli Zaretskii
@ 2003-04-22 13:22                     ` Stephen J. Turnbull
  2003-04-22 14:38                       ` Jerry James
  0 siblings, 1 reply; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-04-22 13:22 UTC (permalink / raw)
  Cc: james, emacs-devel, xemacs-design, teirllm

>>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:

> From: Jerry James <james@xemacs.org>
> Date: 21 Apr 2003 23:09:31 -0500
> 
> It
> appears to me that Emacs 21.1 contained a version with the same behavior
> as XEmacs'

    Eli> There's some confusion (or maybe typos) here: Emacs 21.4 is
    Eli> not released yet, certainly not 4 years ago.  The latest
    Eli> Emacs version is 21.3, released about 2 weeks ago.  Perhaps
    Eli> you got the versions wrong or something.

It's a typo.  Try `cvs diff -r EMACS_20_2 -r EMACS_20_4 subr.el'.
Look for the hunk at line 956.


-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22 13:22                     ` Stephen J. Turnbull
@ 2003-04-22 14:38                       ` Jerry James
  0 siblings, 0 replies; 35+ messages in thread
From: Jerry James @ 2003-04-22 14:38 UTC (permalink / raw)
  Cc: Eli Zaretskii, emacs-devel, xemacs-design, teirllm

"Stephen J. Turnbull" <stephen@xemacs.org>, on Tue, 22 Apr 2003 at
22:22:51 +0900 you wrote:
> >>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:
> 
> > From: Jerry James <james@xemacs.org>
> > Date: 21 Apr 2003 23:09:31 -0500
> > 
> > It
> > appears to me that Emacs 21.1 contained a version with the same behavior
> > as XEmacs'
> 
>     Eli> There's some confusion (or maybe typos) here: Emacs 21.4 is
>     Eli> not released yet, certainly not 4 years ago.  The latest
>     Eli> Emacs version is 21.3, released about 2 weeks ago.  Perhaps
>     Eli> you got the versions wrong or something.
> 
> It's a typo.  Try `cvs diff -r EMACS_20_2 -r EMACS_20_4 subr.el'.
> Look for the hunk at line 956.

Right.  Sorry.  I should know better than to try composing coherent
email just before going to bed.  I meant 20.1 and 20.4, of course.
-- 
Jerry James
http://www.ittc.ku.edu/~james/



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22  4:09                 ` Jerry James
  2003-04-22  8:15                   ` Eli Zaretskii
@ 2003-04-22 12:56                   ` Luc Teirlinck
  2003-04-22 14:56                     ` Jerry James
  1 sibling, 1 reply; 35+ messages in thread
From: Luc Teirlinck @ 2003-04-22 12:56 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

I am not going to respond to the essence of your statement, since it
does not have any.  (It is just emotional stuff, it has no rational
content.)  I just want to point out that I am not an official
spokesperson for Emacs.  I represent my own opinions, not those of
Emacs or "the Emacs developers".  Any "Evil Intents" you seem to be
attributing to Emacs and the Emacs developers are strictly and
completely my own personal Evilness.

Sincerely,

Luc Teirlinck.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22 12:56                   ` Luc Teirlinck
@ 2003-04-22 14:56                     ` Jerry James
  2003-04-22 15:27                       ` Luc Teirlinck
  0 siblings, 1 reply; 35+ messages in thread
From: Jerry James @ 2003-04-22 14:56 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

Luc Teirlinck <teirllm@dms.auburn.edu> wrote:
> I am not going to respond to the essence of your statement, since it
> does not have any.  (It is just emotional stuff, it has no rational
> content.)  I just want to point out that I am not an official
> spokesperson for Emacs.  I represent my own opinions, not those of
> Emacs or "the Emacs developers".  Any "Evil Intents" you seem to be
> attributing to Emacs and the Emacs developers are strictly and
> completely my own personal Evilness.

You have to realize that I'm an academic, Luc.  I asked the questions I
asked, not to accuse or belittle anybody, but as an exercise in the
Socratic method (perhaps a poor one, but that's another discussion).  If
you reread my last message with that in mind, I think you will see that
the case is the opposite of what you assumed: it is all rational
content; none of it is emotional.  Let me summarize the main points I
wanted to make:

1) Some of the resistance to changing Emacs' split-string function is
   coming from people who are worried about breaking existing code.

2) XEmacs has not changed the split-string function (except in the
   development version, which is where we noticed the test breakage that
   prompted all this).

3) Emacs changed the split-string function, somewhere after version 20.1
   was released, and before 20.4 was released.

4) If no code broke at the time, then we have nothing to worry about,
   because no code at all notices the difference.

5) If some code broke, then knowing which code it is that broke is
   relevant to this discussion; hence the question about the existence
   of emacs-devel archives.

The thought of anyone having any kind of evil intent never crossed my
mind.

Regards,
-- 
Jerry James
http://www.ittc.ku.edu/~james/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22 14:56                     ` Jerry James
@ 2003-04-22 15:27                       ` Luc Teirlinck
  0 siblings, 0 replies; 35+ messages in thread
From: Luc Teirlinck @ 2003-04-22 15:27 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

I interpreted your original message as suggesting that I was a
hypocrite (and that other people involved with Emacs were hypocrites)
because I was worrying about breaking existing code now, whereas
nobody connected with Emacs development worried about breaking
existing code four years ago.  (The answer to that is that I did not
subscribe to Emacs devel four years ago and that I represent my own
opinions, not other people's opinions.)

I am willing to believe that you did not mean to suggest the above, in
which case I overreacted to your message.  Sorry.

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22  3:26               ` Luc Teirlinck
  2003-04-22  4:09                 ` Jerry James
@ 2003-04-22 13:19                 ` Stephen J. Turnbull
  2003-04-22 13:39                   ` Miles Bader
                                     ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-04-22 13:19 UTC (permalink / raw)
  Cc: miles, emacs-devel, xemacs-design, rms

>>>>> "Luc" == Luc Teirlinck <teirllm@dms.auburn.edu> writes:

    Luc> Miles Bader wrote:

    mb>    I think Stephen's formulation is very natural, in that you
    mb> usually want OMIT-NULLS to be t if you're splitting on a
    mb> non-whitespace string.

Miles, here you meant OMIT-NULLS to be nil, right?

I think Miles's proposal to default the one-argument form of
`split-string' to GNU behavior and have the two-argument form as
XEmacs's, with the three argument form for precise control, is a good
compromise.  Add

    (defconst split-string-default-separators "[ \\f\\t\\n\\r\\v]+"
      "The default value of separators for `split-string'.

    A regexp matching strings of whitespace.  May be locale-dependent
    \(as yet unimplemented).  Should not match non-breaking spaces.")

and the current XEmacs behavior is very naturally available with

    (split-string string split-string-default-separators)

(although the fact that that means something different from
`(split-string string)' is definitely a wart).

------------------------------------------------------------------------
Back to our regularly scheduled controversy on principles:

    Luc> First of all, I am not worried about Stephen's formulation
    Luc> being unnatural (although the original formulation actually
    Luc> would produce unnatural results in the default case), but
    Luc> about it breaking existing code.

GNU Emacs made the change (viz. cvs diff -r EMACS_20_2 -r EMACS_20_4
subr.el) without worrying sufficiently about breaking existing code
(see Stefan Reichör's post here <uvfxduzt2.fsf@riic.at>, or run
XEmacs's regression test suite on XEmacs 21.5).  I don't see why that
should be a barrier to reverting to the old, regular, behavior now.

Further, as far as GNU Emacs itself goes, I see your theory and raise
you a full-tree patch.  I volunteer to revise the code and fix the
callers in all GNU Emacs code distributed on the mainline.  (I've
already requested papers from rms.)

Sure, we can't guarantee that third party code won't get broken, but
Jerry James has anted an audit of all XEmacs code including the
packages, a significant fraction of 3rd party Emacs Lisp code.
Nothing there will break, although once we get this settled, many
packages can have their local versions of `split-string' either thrown
out or turned into trivial defsubsts around the core version.  Want to
match Jerry's effort with some facts here?  Find us some callers,
we'll send patches to their maintainers.

    Luc> I believe you are underestimating the level of generality of
    Luc> split-string and the wild heterogeneity of its applications.

Et tu, Luc.  You don't imagine using split-string to parse Makefiles
or Python code[1], to detect trailing whitespace (perhaps generated by
older auto-fill implementations to mark sentence breaks) that violates
coding standards, etc.  (Not surprising, since GNU Emacs 21.x can't do
those things using `split-string'.)

Since generality and heterogeneity are much better served by simple
regular interfaces, what you are really arguing is quite the opposite.
Ie, that there's only one important application (splitting into tokens
separated by non-significant whitespace).  And you want the
`split-string' API optimized for that and very similar applications by
default, even though that means that `split-string's non-default
behavior looks totally schizophrenic by comparison.  A lot of people
agree with you (including rms AFAICT), but others don't.  Many XEmacs
people disagree strongly.  (They prefer regularity.)

    Luc> It is by no means whatsoever true that except in the
    Luc> whitespace case you would want to keep all null matches.  If
    Luc> SEPARATORS is a "terminator character", say newline,

Note that Miles's proposal would actually give the behavior you want
in `(split-string string "\n")'.  (Admittedly, you'd like
`(split-string string "\n" 'end)' even better.)  Point for Miles!

But you are exactly right: sometimes one wants it one way, and
sometimes the other.  It is this _irreconcilable_ difference that
leads me to strongly prefer separate APIs, one which imposes
stream-of-token semantics, and one which merely splits strings.  I
think `split-string' is a more natural name for the latter.

    Luc> The "however" is that we are not defining a *new* function
    Luc> but *re*defining an *existing* function, an often used and
    Luc> extremely general existing function.  That is all but
    Luc> guaranteed to produce a wild variety of bugs.

Please consider the history of the change.  You're inaccurate on all
counts.  We propose _reverting_ what is already a redefinition.
Because the redefined function is _less general_ than the original,
it's _used less often_ than it could be.  (Jerry James's audit of
XEmacs and package code demonstrates this.)  And it won't "produce"
bugs, it will _exchange_ a new set of unknown bugs (which is likely to
be small everywhere except in code very specific to GNU Emacs 21) for
a set of existing bugs, which everybody agrees need to be fixed.

So the question basically boils down to whether it makes sense to have
a regular, easily understood definition with exceptions restricted to
a few very clear cases with consensus support, or to aggressively make
"plausible" exceptions.  The last time GNU Emacs did the latter with
this function, it clearly screwed up.

    Luc> In fact let us assume, for the sake of argument, that Stephen
    Luc> and you are 100% right.  That would mean that any correct
    Luc> existing code, using the present Emacs split-string with a
    Luc> non-nil SEPARATORS, checks for empty matches at the beginning
    Luc> and end and adds any such matches to the split-string output
    Luc> to correct the "bug" in the present split-string.  After
    Luc> Stephen's change, any empty match at the beginning and end of
    Luc> the string will produce not one, but two empty strings.

That's silly; what anybody sane would do in the face of GNU Emacs's
demonstrated willingness to change semantics of such a fundamental
function is to copy the old definition into their own code.  It would
probably be shorter, and surely simpler and faster, than the gross
hack you propose.

Footnotes: 
[1] (defun python-parse-indentation (line) 
      (let ((i 0)
            (line (split-string line python-single-indentation)))
        (while (string= (car line) "")
          (setq i (1+ i))
          (setq line (cdr line)))
        (cons i line)))

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22 13:19                 ` Stephen J. Turnbull
@ 2003-04-22 13:39                   ` Miles Bader
  2003-04-22 13:51                   ` Luc Teirlinck
  2003-04-22 16:26                   ` Luc Teirlinck
  2 siblings, 0 replies; 35+ messages in thread
From: Miles Bader @ 2003-04-22 13:39 UTC (permalink / raw)
  Cc: Luc Teirlinck, emacs-devel, xemacs-design, rms

On Tue, Apr 22, 2003 at 10:19:31PM +0900, Stephen J. Turnbull wrote:
>     mb>    I think Stephen's formulation is very natural, in that you
>     mb> usually want OMIT-NULLS to be t if you're splitting on a
>     mb> non-whitespace string.
> 
> Miles, here you meant OMIT-NULLS to be nil, right?

Yeah.

-Miles
-- 
I'd rather be consing.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22 13:19                 ` Stephen J. Turnbull
  2003-04-22 13:39                   ` Miles Bader
@ 2003-04-22 13:51                   ` Luc Teirlinck
  2003-04-22 16:26                   ` Luc Teirlinck
  2 siblings, 0 replies; 35+ messages in thread
From: Luc Teirlinck @ 2003-04-22 13:51 UTC (permalink / raw)
  Cc: miles, emacs-devel, xemacs-design, rms

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

Stephen Turnbull wrote:

   GNU Emacs made the change (viz. cvs diff -r EMACS_20_2 -r EMACS_20_4
   subr.el) without worrying sufficiently about breaking existing code
   (see Stefan Reichör's post here <uvfxduzt2.fsf@riic.at>, or run
   XEmacs's regression test suite on XEmacs 21.5).  I don't see why that
   should be a barrier to reverting to the old, regular, behavior now.

I did not know the history of the function.  I did not subscribe to
this site four years ago.  If I did I would probably have opposed the
original change back then, which might not have made any difference
anyway.  I am not part of some "Conspiracy" started four years ago as
others seem to suggest.  Anyway, I hope this answers the question of
"And where were you four years ago?" which others asked me.  I did not
subscribe to emacs devel back then.

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-22 13:19                 ` Stephen J. Turnbull
  2003-04-22 13:39                   ` Miles Bader
  2003-04-22 13:51                   ` Luc Teirlinck
@ 2003-04-22 16:26                   ` Luc Teirlinck
  2 siblings, 0 replies; 35+ messages in thread
From: Luc Teirlinck @ 2003-04-22 16:26 UTC (permalink / raw)
  Cc: miles, xemacs-design, rms, emacs-devel

Stephen Turnbull wrote:

   Note that Miles's proposal would actually give the behavior you want
   in `(split-string string "\n")'.  (Admittedly, you'd like
   `(split-string string "\n" 'end)' even better.)  Point for Miles!

Just to make sure I understand what you are proposing:

I could not just do (split-string string "\n"), I would first have to
check whether the string ended in a newline and, if so, remove that
newline before calling split-string (or do something else).  Otherwise
split-string would return a "fake" empty line at the end of a newline
terminated buffer or file.  (Correct?)  Or are you actually suggesting
to remove a final empty match, but keep any initial empty match,
exactly the behavior I suggested for "end".

That is, would (split-string "\n" "\n") return ("" "") or ("") ?

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-21 10:58         ` Stephen J. Turnbull
  2003-04-21 21:11           ` Luc Teirlinck
@ 2003-04-23  1:00           ` Richard Stallman
  2003-04-23  4:09             ` Stephen J. Turnbull
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2003-04-23  1:00 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

      So I suppose you want a simpler version of Luc Teirlinck's
    suggestion.  How about:

    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;
    ;; one function, three arguments

    (defun split-string (string &optional separators omit-nulls)

      "Splits STRING into substrings bounded by matches for SEPARATORS.

    The beginning and end of STRING, and each match for SEPARATORS, are
    splitting points.  The substrings between the splitting points are
    collected in a list, which is returned.  (The substrings matching
    SEPARATORS are removed.)

    If SEPARATORS is nil, it defaults to \"[ \f\t\n\r\v]+\".

    If OMIT-NULLs is t, zero-length substrings are omitted from the list
    (so that for the default value of SEPARATORS leading and trailing
    whitespace are trimmed).  If nil, all zero-length substrings are
    retained, which correctly parses CSV format, for example."

That seems like the right thing, except I think that if SEPARATORS is
nil, OMIT-NULLS should default to t.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-23  1:00           ` Richard Stallman
@ 2003-04-23  4:09             ` Stephen J. Turnbull
  2003-04-24 23:12               ` Richard Stallman
  2003-05-20  1:55               ` Stephen J. Turnbull
  0 siblings, 2 replies; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-04-23  4:09 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

>>>>> "rms" == Richard Stallman <rms@gnu.org> writes:

    (defun split-string (string &optional separators omit-nulls)

      "Splits STRING into substrings bounded by matches for SEPARATORS.

    The beginning and end of STRING, and each match for SEPARATORS, are
    splitting points.  The substrings between the splitting points are
    collected in a list, which is returned.  (The substrings matching
    SEPARATORS are removed.)

    If SEPARATORS is nil, it defaults to \"[ \f\t\n\r\v]+\".

    If OMIT-NULLs is t, zero-length substrings are omitted from the list
    (so that for the default value of SEPARATORS leading and trailing
    whitespace are trimmed).  If nil, all zero-length substrings are
    retained, which correctly parses CSV format, for example."

    rms> That seems like the right thing, except I think that if
    rms> SEPARATORS is nil, OMIT-NULLS should default to t.

OK.  That is satisfactory for XEmacs, and we'll implement that.

Unless you say you prefer to do it yourself, I will also submit a
patch against GNU Emacs CVS head, and audit the Lisp code in CVS head
to make sure there are no surprises from callers with non-default
SEPARATORS.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-23  4:09             ` Stephen J. Turnbull
@ 2003-04-24 23:12               ` Richard Stallman
  2003-05-20  1:55               ` Stephen J. Turnbull
  1 sibling, 0 replies; 35+ messages in thread
From: Richard Stallman @ 2003-04-24 23:12 UTC (permalink / raw)
  Cc: emacs-devel, xemacs-design

    Unless you say you prefer to do it yourself, I will also submit a
    patch against GNU Emacs CVS head, and audit the Lisp code in CVS head
    to make sure there are no surprises from callers with non-default
    SEPARATORS.

That would be very kind of you.  To use the same code can't hurt, and
may help.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-04-23  4:09             ` Stephen J. Turnbull
  2003-04-24 23:12               ` Richard Stallman
@ 2003-05-20  1:55               ` Stephen J. Turnbull
  2003-05-22 15:00                 ` Kai Großjohann
  1 sibling, 1 reply; 35+ messages in thread
From: Stephen J. Turnbull @ 2003-05-20  1:55 UTC (permalink / raw)


>>>>> "sjt" == Stephen J Turnbull <stephen@xemacs.org> writes:

    sjt> OK.  That is satisfactory for XEmacs, and we'll implement
    sjt> that.

    sjt> Unless you say you prefer to do it yourself, I will also
    sjt> submit a patch against GNU Emacs CVS head, and audit the Lisp
    sjt> code in CVS head to make sure there are no surprises from
    sjt> callers with non-default SEPARATORS.

Enclosed are patches for lisp/subr.el and lispref/strings.texi to
implement the API for split-string discussed earlier.

Also enclosed is the result of an audit of uses of split-string in
Emacs CVS (as of about three weeks ago).  I didn't notice any cases
where the changed specification made existing code out-and-out
incorrect, so there are no further patches suggested.  However, I
think a lot of the uses with an explicit SEPARATORS are semantically
dubious without using the OMIT-NULLS flag (and most were semantically
dubious before the change to split-string, because it's at least
theoretically possible for a null string to arise in the interior of
the list).  Most other uses of split-string are dubious in that either
they depend heavily on undocumented implementation details of other
utilities (eg, that the fields in /etc/mtab are separated by exactly
one space) or are not very robust to bogus input.  People who
understand the modules in question might want to take a closer look.

A few I couldn't tell at all without doing a much deeper analysis of
the code than I have time for right now:

./lisp/calendar/todo-mode.el:869:  needs checking
./lisp/eshell/em-pred.el:601:  needs checking
./lisp/mh-e/mh-utils.el:1606:  needs checking
./lisp/textmodes/reftex.el:934:  needs checking
./lisp/textmodes/reftex.el:2161:  needs checking

If you set default-directory to the root of the Emacs hierarchy, the
following function is useful to jump to the reference.  nb. a few of
the references have changed since I started the audit.

(defun sjt/parse-grep-n2 ()
  "Parse `grep -n -#' output for filename and line number."
  (interactive)
  (beginning-of-line)
  (when (re-search-forward "^\\(\\S-+\\):\\([0-9]+\\):")
    (cons (match-string 1) (string-to-number (match-string 2)))))

(defun sjt/parse-grep-n-and-go ()
  "Jump to place specified by `grep -n' output."
  (interactive)
  (let* ((pair (sjt/parse-grep-n2))
	 (file (car pair))
         (line (cdr pair)))
    (find-file file)
    (goto-line line)))


lisp/ChangeLog 2003-05-16 Stephen J. Turnbull <stephen@xemacs.org>

	* subr.el (split-string): Implement specification that splitting
	on explicit separators retains null fields.  Add new argument
	OMIT-NULLS.  Special-case (split-string "a string").

lispref/ChangeLog
2003-05-16  Stephen J. Turnbull  <stephen@xemacs.org>

	* strings.texi (Creating Strings): Update split-string
	specification and examples.

Index: lisp/subr.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/subr.el,v
retrieving revision 1.350
diff -u -r1.350 subr.el
--- lisp/subr.el	24 Apr 2003 23:14:12 -0000	1.350
+++ lisp/subr.el	16 May 2003 10:03:58 -0000
@@ -1792,19 +1792,45 @@
 	(buffer-substring-no-properties (match-beginning num)
 					(match-end num)))))
 
-(defun split-string (string &optional separators)
-  "Splits STRING into substrings where there are matches for SEPARATORS.
-Each match for SEPARATORS is a splitting point.
-The substrings between the splitting points are made into a list
+(defconst split-string-default-separators "[ \f\t\n\r\v]+"
+  "The default value of separators for `split-string'.
+
+A regexp matching strings of whitespace.  May be locale-dependent
+\(as yet unimplemented).  Should not match non-breaking spaces.
+
+Warning: binding this to a different value and using it as default is
+likely to have undesired semantics.")
+
+;; The specification says that if both SEPARATORS and OMIT-NULLS are
+;; defaulted, OMIT-NULLS should be treated as t.  Simplifying the logical
+;; expression leads to the equivalent implementation that if SEPARATORS
+;; is defaulted, OMIT-NULLS is treated as t.
+(defun split-string (string &optional separators omit-nulls)
+  "Splits STRING into substrings bounded by matches for SEPARATORS.
+
+The beginning and end of STRING, and each match for SEPARATORS, are
+splitting points.  The substrings matching SEPARATORS are removed, and
+the substrings between the splitting points are collected as a list,
 which is returned.
-If SEPARATORS is absent, it defaults to \"[ \\f\\t\\n\\r\\v]+\".
 
-If there is match for SEPARATORS at the beginning of STRING, we do not
-include a null substring for that.  Likewise, if there is a match
-at the end of STRING, we don't include a null substring for that.
+If SEPARATORS is non-nil, it should be a regular expression matching text
+which separates, but is not part of, the substrings.  If nil it defaults to
+`split-string-default-separators', normally \"[ \\f\\t\\n\\r\\v]+\", and
+OMIT-NULLS is forced to t.
+
+If OMIT-NULLs is t, zero-length substrings are omitted from the list \(so
+that for the default value of SEPARATORS leading and trailing whitespace
+are effectively trimmed).  If nil, all zero-length substrings are retained,
+which correctly parses CSV format, for example.
+
+Note that the effect of `(split-string STRING)' is the same as
+`(split-string STRING split-string-default-separators t)').  In the rare
+case that you wish to retain zero-length substrings when splitting on
+whitespace, use `(split-string STRING split-string-default-separators)'.
 
 Modifies the match data; use `save-match-data' if necessary."
-  (let ((rexp (or separators "[ \f\t\n\r\v]+"))
+  (let ((keep-nulls (not (if separators omit-nulls t)))
+	(rexp (or separators split-string-default-separators))
 	(start 0)
 	notfirst
 	(list nil))
@@ -1813,16 +1839,14 @@
 				       (= start (match-beginning 0))
 				       (< start (length string)))
 				  (1+ start) start))
-		(< (match-beginning 0) (length string)))
+		(< start (length string)))
       (setq notfirst t)
-      (or (eq (match-beginning 0) 0)
-	  (and (eq (match-beginning 0) (match-end 0))
-	       (eq (match-beginning 0) start))
+      (if (or keep-nulls (< start (match-beginning 0)))
 	  (setq list
 		(cons (substring string start (match-beginning 0))
 		      list)))
       (setq start (match-end 0)))
-    (or (eq start (length string))
+    (if (or keep-nulls (< start (length string)))
 	(setq list
 	      (cons (substring string start)
 		    list)))


Index: lispref/strings.texi
===================================================================
RCS file: /cvsroot/emacs/emacs/lispref/strings.texi,v
retrieving revision 1.23
diff -u -r1.23 strings.texi
--- lispref/strings.texi	4 Feb 2003 14:47:54 -0000	1.23
+++ lispref/strings.texi	16 May 2003 10:03:59 -0000
@@ -259,30 +259,46 @@
 Lists}.
 @end defun
 
-@defun split-string string separators
+@defun split-string string separators omit-nulls
 This function splits @var{string} into substrings at matches for the regular
 expression @var{separators}.  Each match for @var{separators} defines a
 splitting point; the substrings between the splitting points are made
-into a list, which is the value returned by @code{split-string}.
+into a list, which is the value returned by @code{split-string}.  If
+@var{omit-nulls} is @code{t}, null strings will be removed from the
+result list.  Otherwise, null strings are left in the result.
 If @var{separators} is @code{nil} (or omitted),
-the default is @code{"[ \f\t\n\r\v]+"}.
+the default is the value of @code{split-string-default-separators}.
 
-For example,
+@defvar split-string-default-separators
+The default value of @var{separators} for @code{split-string}, initially
+@samp{"[ \f\t\n\r\v]+"}.
+
+As a special case, when @var{separators} is @code{nil} (or omitted),
+null strings are always omitted from the result.  Thus:
 
 @example
-(split-string "Soup is good food" "o")
-@result{} ("S" "up is g" "" "d f" "" "d")
-(split-string "Soup is good food" "o+")
-@result{} ("S" "up is g" "d f" "d")
+(split-string "  two words ")
+@result{} ("two" "words")
+@end example
+
+The result is not @samp{("" "two" "words" "")}, which would rarely be
+useful.  If you need such a result, use an explict value for
+@var{separators}:
+
+@example
+(split-string "  two words " split-string-default-separators)
+@result{} ("" "two" "words" "")
 @end example
 
-When there is a match adjacent to the beginning or end of the string,
-this does not cause a null string to appear at the beginning or end
-of the list:
+More examples:
 
 @example
-(split-string "out to moo" "o+")
-@result{} ("ut t" " m")
+(split-string "Soup is good food" "o")
+@result{} ("S" "up is g" "" "d f" "" "d")
+(split-string "Soup is good food" "o" t)
+@result{} ("S" "up is g" "d f" "d")
+(split-string "Soup is good food" "o+")
+@result{} ("S" "up is g" "d f" "d")
 @end example
 
 Empty matches do count, when not adjacent to another match:

bash-2.05b$ find . -name '*.el' | xargs fgrep -2 -n split-string /dev/null
./lisp/apropos.el:267:  want OMIT-NULLS t
./lisp/calendar/todo-mode.el:869:  needs checking
./lisp/cvs-status.el:286:  new semantics preferred; no error checking
./lisp/diff-mode.el:1047:  OK, double default
./lisp/ediff-diff.el:1143:  OK
./lisp/emacs-lisp/authors.el:460:  double default, OK
./lisp/emacs-lisp/crm.el:419:  new semantics preferred; no error checking
./lisp/emacs-lisp/crm.el:605:  new semantics preferred; no error checking
./lisp/emacs-lisp/lisp-mnt.el:412:  want OMIT-NULLS t
./lisp/emacs-lisp/unsafep.el:111:  mentioned in comment, not used
./lisp/eshell/em-cmpl.el:403:  new semantics preferred; no error checking
./lisp/eshell/em-ls.el:257:  OK, double default
./lisp/eshell/em-pred.el:601:  needs checking
./lisp/eshell/esh-util.el:228:  want OMIT-NULLS t
./lisp/eshell/esh-util.el:449:  new semantics preferred; no error checking
./lisp/eshell/esh-var.el:568:  new semantics preferred; no error checking
./lisp/files.el:4254:  double default, OK
./lisp/filesets.el:1202:  new semantics preferred; no error checking
./lisp/gdb-ui.el:1001:  new semantics preferred; no error checking
./lisp/gnus/gnus-art.el:4645:  new semantics preferred; no error checking
./lisp/gnus/gnus-group.el:3798:  OK
./lisp/gnus/gnus.el:2679:  OK
./lisp/gnus/gnus.el:2681:  OK
./lisp/gnus/mailcap.el:367:  OK, could use OMIT-NULLS t instead
./lisp/gnus/mailcap.el:502:  want OMIT-NULLS t
./lisp/gnus/mailcap.el:648:  new semantics preferred; no error checking (splitting MIME content type)
./lisp/gnus/mailcap.el:702:  new semantics preferred; no error checking (splitting MIME content type)
./lisp/gnus/mailcap.el:870:  OK, could use OMIT-NULLS t instead
./lisp/gnus/mailcap.el:940:  new semantics preferred; no error checking (splitting MIME content type)
./lisp/gnus/message.el:4701:  want OMIT-NULLS t
./lisp/gnus/mm-decode.el:55:  new semantics preferred; no error checking (splitting MIME content type)
./lisp/gnus/mm-decode.el:57:  new semantics preferred; no error checking (splitting MIME content type)
./lisp/gnus/mm-decode.el:264:  new semantics preferred; no error checking (splitting MIME content type)
./lisp/gnus/mm-decode.el:363:  OK, double default
./lisp/gnus/mml.el:307:	 new semantics preferred; no error checking (splitting MIME content type)
./lisp/gnus/mml.el:337:	 ditto
./lisp/gnus/nnslashdot.el:364:  OK, double default
./lisp/gnus/nnslashdot.el:488:  OK, could use OMIT-NULLS t instead
./lisp/gnus/nnultimate.el:176:  OK, could use OMIT-NULLS t instead
./lisp/gnus/pop3.el:249:  want OMIT-NULLS t
./lisp/gnus/pop3.el:346:  want OMIT-NULLS t
./lisp/gnus/pop3.el:347:  want OMIT-NULLS t
./lisp/gnus/pop3.el:409:  want OMIT-NULLS t
./lisp/gnus/rfc2231.el:131:  new semantics preferred; no error checking (splitting encoded word into locale info)
./lisp/gud.el:1817:  OK
./lisp/gud.el:1847:  OK
./lisp/gud.el:2288:  OK, double default
./lisp/gud.el:2813:  OK
./lisp/hexl.el:635:  double default, OK
./lisp/hexl.el:652:  double default, OK
./lisp/ido.el:2502:  want OMIT-NULLS t
./lisp/ido.el:2868:  want OMIT-NULLS t
./lisp/info.el:387:  want OMIT-NULLS t
./lisp/info.el:390:  want OMIT-NULLS t
./lisp/mail/rfc2368.el:137:  OK
./lisp/mail/rfc2368.el:144:  new semantics preferred; no error checking
./lisp/mail/smtpmail.el:602:  want OMIT-NULLS t
./lisp/mh-e/mh-alias.el:156:  want OMIT-NULLS t
./lisp/mh-e/mh-alias.el:289:  OK
./lisp/mh-e/mh-alias.el:469:  OK
./lisp/mh-e/mh-comp.el:374:  OK, double default
./lisp/mh-e/mh-e.el:2164:  OK, double default
./lisp/mh-e/mh-index.el:475:  OK, double default
./lisp/mh-e/mh-seq.el:966:  OK, double default
./lisp/mh-e/mh-utils.el:1606:  needs checking
./lisp/net/eudc-export.el:126:  OK
./lisp/net/eudc.el:161:  Emacs 21 compatible
./lisp/net/eudc.el:419:	 want OMIT-NULLS t
./lisp/net/eudc.el:442:	 check this
./lisp/net/eudc.el:833:	 want OMIT-NULLS t
./lisp/net/eudcb-ldap.el:90:  OK
./lisp/net/ldap.el:415:	 new semantics preferred; no error checking
./lisp/net/ldap.el:420:	 OK
./lisp/net/tramp.el:5658:  check this
./lisp/net/tramp.el:6257:  tramp-split-string is not quite emacs compatible
./lisp/pcmpl-cvs.el:175:  new semantics preferred; no error checking
./lisp/pcmpl-gnu.el:127:  OK, double default
./lisp/pcmpl-linux.el:46:  double default, OK
./lisp/pcmpl-linux.el:88:  want OMIT-NULLS t
./lisp/pcmpl-linux.el:101:  want OMIT-NULLS t
./lisp/pcmpl-rpm.el:39:  OK, double default
./lisp/pcmpl-rpm.el:46:  OK, double default
./lisp/pcmpl-unix.el:89:  new semantics preferred; no error checking
./lisp/pcvs-util.el:227:  want OMIT-NULLS t
./lisp/pcvs-util.el:228:  want OMIT-NULLS t
./lisp/progmodes/ada-prj.el:590:  want OMIT-NULLS t
./lisp/progmodes/ada-xref.el:207:  new semantics preferred; no error checking
./lisp/progmodes/fortran.el:267:  want OMIT-NULLS t
./lisp/progmodes/idlw-shell.el:1734:  could use new split-string with OMIT-NULLS t
./lisp/progmodes/idlwave.el:3702:  prior XEmacs-compatible, could use new split-string
./lisp/progmodes/inf-lisp.el:285:  double default, OK
./lisp/progmodes/vhdl-mode.el:13030:  new semantics preferred; no error checking
./lisp/progmodes/vhdl-mode.el:13171:  new semantics preferred; no error checking
./lisp/progmodes/vhdl-mode.el:13698:  new semantics preferred; no error checking
./lisp/progmodes/vhdl-mode.el:13701:  new semantics preferred; no error checking
./lisp/textmodes/bibtex.el:2665:  new semantics preferred; no error checking
./lisp/textmodes/reftex-cite.el:192:  Gone?
./lisp/textmodes/reftex-cite.el:373:  new semantics preferred; no error checking
./lisp/textmodes/reftex-cite.el:383:  new semantics preferred; no error checking
./lisp/textmodes/reftex-cite.el:445:  OK
./lisp/textmodes/reftex-cite.el:863:  new semantics preferred; no error checking
./lisp/textmodes/reftex-cite.el:961:  new semantics preferred; no error checking
./lisp/textmodes/reftex-index.el:1552:  new semantics preferred; no error checking
./lisp/textmodes/reftex-index.el:1685:  want OMIT-NULLS t
./lisp/textmodes/reftex-index.el:1734:  OK, double default
./lisp/textmodes/reftex-index.el:1748:  OK, double default
./lisp/textmodes/reftex-index.el:1755:	OK, double default
./lisp/textmodes/reftex-index.el:1762:  new semantics preferred; no error checking
./lisp/textmodes/reftex-index.el:1818:	new semantics preferred; no error checking
./lisp/textmodes/reftex-parse.el:343:  new semantics preferred; no error checking
./lisp/textmodes/reftex-parse.el:482:  OK, mapconcat used
./lisp/textmodes/reftex-parse.el:990:  new semantics preferred; no error checking
./lisp/textmodes/reftex.el:934:  needs checking
./lisp/textmodes/reftex.el:1455:  OK, double default
./lisp/textmodes/reftex.el:1488:  OK, double default
./lisp/textmodes/reftex.el:1556:  OK, could use OMIT-NULLS t instead
./lisp/textmodes/reftex.el:2161:  needs checking (uses explicit re or explicit ws)
./lisp/vc-cvs.el:789:	 new semantics preferred; requires rewrite to use
./lisp/xml.el:432:  OK
./lisp/xml.el:436:  OK



-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
  2003-05-20  1:55               ` Stephen J. Turnbull
@ 2003-05-22 15:00                 ` Kai Großjohann
  0 siblings, 0 replies; 35+ messages in thread
From: Kai Großjohann @ 2003-05-22 15:00 UTC (permalink / raw)


"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> Enclosed are patches for lisp/subr.el and lispref/strings.texi to
> implement the API for split-string discussed earlier.

I wonder what's going to happen with this?  It hasn't been committed,
AFAICS.  Does anyone know?
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rationale for split-string?
@ 2003-05-20  3:11 Bill Wohler
  0 siblings, 0 replies; 35+ messages in thread
From: Bill Wohler @ 2003-05-20  3:11 UTC (permalink / raw)
  Cc: emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> A few I couldn't tell at all without doing a much deeper analysis of
> the code than I have time for right now:
>
> ./lisp/calendar/todo-mode.el:869:  needs checking
> ./lisp/eshell/em-pred.el:601:  needs checking
> ./lisp/mh-e/mh-utils.el:1606:  needs checking

Thanks very much for checking. I believe that Satyaki has already fixed
this in CVS MH-E so that it would be compatible with present and future
versions of Emacs as well as XEmacs. Given our recent history, this
should find its way into CVS Emacs in a few weeks (in MH-E 7.4).

> ./lisp/mh-e/mh-alias.el:156:  want OMIT-NULLS t
> ./lisp/mh-e/mh-alias.el:289:  OK
> ./lisp/mh-e/mh-alias.el:469:  OK
> ./lisp/mh-e/mh-comp.el:374:  OK, double default
> ./lisp/mh-e/mh-e.el:2164:  OK, double default
> ./lisp/mh-e/mh-index.el:475:  OK, double default
> ./lisp/mh-e/mh-seq.el:966:  OK, double default
> ./lisp/mh-e/mh-utils.el:1606:  needs checking

--
Bill Wohler <wohler@newt.com>  http://www.newt.com/wohler/  GnuPG ID:610BD9AD
Maintainer of comp.mail.mh FAQ and MH-E. Vote Libertarian!
If you're passed on the right, you're in the wrong lane.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2003-05-22 15:00 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-17  9:06 Rationale for split-string? Stephen J. Turnbull
2003-04-17 11:30 ` Stefan Reichör
2003-04-18  1:54   ` Richard Stallman
2003-04-18  2:59     ` Steve Youngs
2003-04-17 17:44 ` Stefan Monnier
2003-04-17 19:32   ` Luc Teirlinck
2003-04-18 11:50   ` Stephen J. Turnbull
2003-04-18 14:17     ` Stefan Monnier
2003-04-19  8:18       ` Stephen J. Turnbull
2003-04-19 13:35     ` Richard Stallman
2003-04-19  4:14   ` Richard Stallman
2003-04-19  8:55     ` Stephen J. Turnbull
2003-04-21  0:59       ` Richard Stallman
2003-04-21  1:55         ` Luc Teirlinck
2003-04-21 10:58         ` Stephen J. Turnbull
2003-04-21 21:11           ` Luc Teirlinck
2003-04-21 23:43             ` Miles Bader
2003-04-22  3:26               ` Luc Teirlinck
2003-04-22  4:09                 ` Jerry James
2003-04-22  8:15                   ` Eli Zaretskii
2003-04-22 13:22                     ` Stephen J. Turnbull
2003-04-22 14:38                       ` Jerry James
2003-04-22 12:56                   ` Luc Teirlinck
2003-04-22 14:56                     ` Jerry James
2003-04-22 15:27                       ` Luc Teirlinck
2003-04-22 13:19                 ` Stephen J. Turnbull
2003-04-22 13:39                   ` Miles Bader
2003-04-22 13:51                   ` Luc Teirlinck
2003-04-22 16:26                   ` Luc Teirlinck
2003-04-23  1:00           ` Richard Stallman
2003-04-23  4:09             ` Stephen J. Turnbull
2003-04-24 23:12               ` Richard Stallman
2003-05-20  1:55               ` Stephen J. Turnbull
2003-05-22 15:00                 ` Kai Großjohann
  -- strict thread matches above, loose matches on Subject: below --
2003-05-20  3:11 Bill Wohler

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).