From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.xemacs.design,gmane.emacs.devel Subject: Re: Rationale for split-string? Date: Tue, 22 Apr 2003 22:19:31 +0900 Organization: The XEmacs Project Sender: xemacs-design-admin@xemacs.org Message-ID: <87smsay8ik.fsf@tleepslib.sk.tsukuba.ac.jp> References: <87brz57at2.fsf@tleepslib.sk.tsukuba.ac.jp> <200304171744.h3HHiJCx009215@rum.cs.yale.edu> <87adem27ey.fsf@tleepslib.sk.tsukuba.ac.jp> <87ist8yv4n.fsf@tleepslib.sk.tsukuba.ac.jp> <200304212111.h3LLBLK11879@eel.dms.auburn.edu> <20030421234347.GA12507@gnu.org> <200304220326.h3M3Q1912252@eel.dms.auburn.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1051017689 8445 80.91.224.249 (22 Apr 2003 13:21:29 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Tue, 22 Apr 2003 13:21:29 +0000 (UTC) Cc: miles@gnu.org, emacs-devel@gnu.org, xemacs-design@xemacs.org, rms@gnu.org Original-X-From: xemacs-design-admin@xemacs.org Tue Apr 22 15:21:27 2003 Return-path: Original-Received: from gwyn.tux.org ([199.184.165.135]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 197xhh-0002Ak-00 for ; Tue, 22 Apr 2003 15:20:58 +0200 Original-Received: from gwyn.tux.org (localhost.localdomain [127.0.0.1]) by gwyn.tux.org (8.11.6p2/8.9.1) with ESMTP id h3MDLRM27572; Tue, 22 Apr 2003 09:21:27 -0400 Original-Received: (from turnbull@localhost) by gwyn.tux.org (8.11.6p2/8.9.1) id h3MDKnI27395 for xemacs-design-mailman@xemacs.org; Tue, 22 Apr 2003 09:20:49 -0400 Original-Received: (from mail@localhost) by gwyn.tux.org (8.11.6p2/8.9.1) id h3MDKci27324 for turnbull@tux.org; Tue, 22 Apr 2003 09:20:38 -0400 Original-Received: from tleepslib.sk.tsukuba.ac.jp (tleepslib.sk.tsukuba.ac.jp [130.158.98.109]) by gwyn.tux.org (8.11.6p2/8.9.1) with ESMTP id h3MDKZM27302 for ; Tue, 22 Apr 2003 09:20:37 -0400 Original-Received: from steve by tleepslib.sk.tsukuba.ac.jp with local (Exim 3.36 #1 (Debian)) id 197xgJ-0006nd-00; Tue, 22 Apr 2003 22:19:31 +0900 Original-To: Luc Teirlinck In-Reply-To: <200304220326.h3M3Q1912252@eel.dms.auburn.edu> (Luc Teirlinck's message of "Mon, 21 Apr 2003 22:26:01 -0500 (CDT)") User-Agent: Gnus/5.090016 (Oort Gnus v0.16) XEmacs/21.5 (cabbage) X-MIME-Autoconverted: from quoted-printable to 8bit by gwyn.tux.org id h3MDKZM27302 X-XEmacs-List: design Errors-To: xemacs-design-admin@xemacs.org X-BeenThere: xemacs-design@xemacs.org X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Discussion of design and features for XEmacs. List-Unsubscribe: , X-MIME-Autoconverted: from 8bit to quoted-printable by gwyn.tux.org id h3MDLRM27572 Xref: main.gmane.org gmane.emacs.xemacs.design:2099 gmane.emacs.devel:13348 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13348 >>>>> "Luc" =3D=3D Luc Teirlinck writes: Luc> Miles Bader wrote: mb> I think Stephen's formulation is very natural, in that you mb> usually want OMIT-NULLS to be t if you're splitting on a mb> non-whitespace string. Miles, here you meant OMIT-NULLS to be nil, right? I think Miles's proposal to default the one-argument form of `split-string' to GNU behavior and have the two-argument form as XEmacs's, with the three argument form for precise control, is a good compromise. Add (defconst split-string-default-separators "[ \\f\\t\\n\\r\\v]+" "The default value of separators for `split-string'. A regexp matching strings of whitespace. May be locale-dependent \(as yet unimplemented). Should not match non-breaking spaces.") and the current XEmacs behavior is very naturally available with (split-string string split-string-default-separators) (although the fact that that means something different from `(split-string string)' is definitely a wart). ------------------------------------------------------------------------ Back to our regularly scheduled controversy on principles: Luc> First of all, I am not worried about Stephen's formulation Luc> being unnatural (although the original formulation actually Luc> would produce unnatural results in the default case), but Luc> about it breaking existing code. GNU Emacs made the change (viz. cvs diff -r EMACS_20_2 -r EMACS_20_4 subr.el) without worrying sufficiently about breaking existing code (see Stefan Reich=F6r's post here , or run XEmacs's regression test suite on XEmacs 21.5). I don't see why that should be a barrier to reverting to the old, regular, behavior now. Further, as far as GNU Emacs itself goes, I see your theory and raise you a full-tree patch. I volunteer to revise the code and fix the callers in all GNU Emacs code distributed on the mainline. (I've already requested papers from rms.) Sure, we can't guarantee that third party code won't get broken, but Jerry James has anted an audit of all XEmacs code including the packages, a significant fraction of 3rd party Emacs Lisp code. Nothing there will break, although once we get this settled, many packages can have their local versions of `split-string' either thrown out or turned into trivial defsubsts around the core version. Want to match Jerry's effort with some facts here? Find us some callers, we'll send patches to their maintainers. Luc> I believe you are underestimating the level of generality of Luc> split-string and the wild heterogeneity of its applications. Et tu, Luc. You don't imagine using split-string to parse Makefiles or Python code[1], to detect trailing whitespace (perhaps generated by older auto-fill implementations to mark sentence breaks) that violates coding standards, etc. (Not surprising, since GNU Emacs 21.x can't do those things using `split-string'.) Since generality and heterogeneity are much better served by simple regular interfaces, what you are really arguing is quite the opposite. Ie, that there's only one important application (splitting into tokens separated by non-significant whitespace). And you want the `split-string' API optimized for that and very similar applications by default, even though that means that `split-string's non-default behavior looks totally schizophrenic by comparison. A lot of people agree with you (including rms AFAICT), but others don't. Many XEmacs people disagree strongly. (They prefer regularity.) Luc> It is by no means whatsoever true that except in the Luc> whitespace case you would want to keep all null matches. If Luc> SEPARATORS is a "terminator character", say newline, Note that Miles's proposal would actually give the behavior you want in `(split-string string "\n")'. (Admittedly, you'd like `(split-string string "\n" 'end)' even better.) Point for Miles! But you are exactly right: sometimes one wants it one way, and sometimes the other. It is this _irreconcilable_ difference that leads me to strongly prefer separate APIs, one which imposes stream-of-token semantics, and one which merely splits strings. I think `split-string' is a more natural name for the latter. Luc> The "however" is that we are not defining a *new* function Luc> but *re*defining an *existing* function, an often used and Luc> extremely general existing function. That is all but Luc> guaranteed to produce a wild variety of bugs. Please consider the history of the change. You're inaccurate on all counts. We propose _reverting_ what is already a redefinition. Because the redefined function is _less general_ than the original, it's _used less often_ than it could be. (Jerry James's audit of XEmacs and package code demonstrates this.) And it won't "produce" bugs, it will _exchange_ a new set of unknown bugs (which is likely to be small everywhere except in code very specific to GNU Emacs 21) for a set of existing bugs, which everybody agrees need to be fixed. So the question basically boils down to whether it makes sense to have a regular, easily understood definition with exceptions restricted to a few very clear cases with consensus support, or to aggressively make "plausible" exceptions. The last time GNU Emacs did the latter with this function, it clearly screwed up. Luc> In fact let us assume, for the sake of argument, that Stephen Luc> and you are 100% right. That would mean that any correct Luc> existing code, using the present Emacs split-string with a Luc> non-nil SEPARATORS, checks for empty matches at the beginning Luc> and end and adds any such matches to the split-string output Luc> to correct the "bug" in the present split-string. After Luc> Stephen's change, any empty match at the beginning and end of Luc> the string will produce not one, but two empty strings. That's silly; what anybody sane would do in the face of GNU Emacs's demonstrated willingness to change semantics of such a fundamental function is to copy the old definition into their own code. It would probably be shorter, and surely simpler and faster, than the gross hack you propose. Footnotes:=20 [1] (defun python-parse-indentation (line)=20 (let ((i 0) (line (split-string line python-single-indentation))) (while (string=3D (car line) "") (setq i (1+ i)) (setq line (cdr line))) (cons i line))) --=20 Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.= ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 = JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.