From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Luc Teirlinck Newsgroups: gmane.emacs.xemacs.design,gmane.emacs.devel Subject: Re: Rationale for split-string? Date: Mon, 21 Apr 2003 22:26:01 -0500 (CDT) Sender: xemacs-design-admin@xemacs.org Message-ID: <200304220326.h3M3Q1912252@eel.dms.auburn.edu> References: <87brz57at2.fsf@tleepslib.sk.tsukuba.ac.jp> <200304171744.h3HHiJCx009215@rum.cs.yale.edu> <87adem27ey.fsf@tleepslib.sk.tsukuba.ac.jp> <87ist8yv4n.fsf@tleepslib.sk.tsukuba.ac.jp> <200304212111.h3LLBLK11879@eel.dms.auburn.edu> <20030421234347.GA12507@gnu.org> NNTP-Posting-Host: main.gmane.org X-Trace: main.gmane.org 1050982017 7309 80.91.224.249 (22 Apr 2003 03:26:57 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Tue, 22 Apr 2003 03:26:57 +0000 (UTC) Cc: stephen@xemacs.org, emacs-devel@gnu.org, xemacs-design@xemacs.org, rms@gnu.org Original-X-From: xemacs-design-admin@xemacs.org Tue Apr 22 05:26:55 2003 Return-path: Original-Received: from gwyn.tux.org ([199.184.165.135]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 197oQo-0001tl-00 for ; Tue, 22 Apr 2003 05:26:54 +0200 Original-Received: from gwyn.tux.org (localhost.localdomain [127.0.0.1]) by gwyn.tux.org (8.11.6p2/8.9.1) with ESMTP id h3M3R9M30522; Mon, 21 Apr 2003 23:27:09 -0400 Original-Received: (from turnbull@localhost) by gwyn.tux.org (8.11.6p2/8.9.1) id h3M3QMI30389 for xemacs-design-mailman@xemacs.org; Mon, 21 Apr 2003 23:26:22 -0400 Original-Received: (from mail@localhost) by gwyn.tux.org (8.11.6p2/8.9.1) id h3M3QC430354 for turnbull@tux.org; Mon, 21 Apr 2003 23:26:12 -0400 Original-Received: from manatee.dms.auburn.edu (manatee.dms.auburn.edu [131.204.53.104]) by gwyn.tux.org (8.11.6p2/8.9.1) with ESMTP id h3M3Q3M30332; Mon, 21 Apr 2003 23:26:07 -0400 Original-Received: from eel.dms.auburn.edu (eel.dms.auburn.edu [131.204.53.108]) by manatee.dms.auburn.edu (8.12.9/8.12.9) with ESMTP id h3M3Q0oc026727; Mon, 21 Apr 2003 22:26:01 -0500 (CDT) Original-Received: (from teirllm@localhost) by eel.dms.auburn.edu (8.11.6+Sun/8.11.6) id h3M3Q1912252; Mon, 21 Apr 2003 22:26:01 -0500 (CDT) X-Authentication-Warning: eel.dms.auburn.edu: teirllm set sender to teirllm@dms.auburn.edu using -f Original-To: miles@gnu.org In-reply-to: <20030421234347.GA12507@gnu.org> (message from Miles Bader on Mon, 21 Apr 2003 19:43:47 -0400) X-XEmacs-List: design Errors-To: xemacs-design-admin@xemacs.org X-BeenThere: xemacs-design@xemacs.org X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Discussion of design and features for XEmacs. List-Unsubscribe: , Xref: main.gmane.org gmane.emacs.xemacs.design:2090 gmane.emacs.devel:13339 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13339 Miles Bader wrote: I think Stephen's formulation is very natural, in that you usually want OMIT-NULLS to be t if you're splitting on a non-whitespace string. First of all, I am not worried about Stephen's formulation being unnatural (although the original formulation actually would produce unnatural results in the default case), but about it breaking existing code. I believe you are underestimating the level of generality of split-string and the wild heterogeneity of its applications. It is by no means whatsoever true that except in the whitespace case you would want to keep all null matches. If SEPARATORS is a "terminator character", say newline, then a null match at the beginning counts. There is no reason you would start the string with a terminator other than to explicitly terminate an empty string. The empty match at the end does not count, because the terminator at that place just terminates the previous match. This is, for instance, how you would want to split a buffer, or a file, or user input, into lines. The way you implement that with the current split-string is to first check for an initial terminator and, if there is one, prepend an empty string to the split-string output. With the proposed new split-string, you delete the empty match at the end from the split-string output. That is actually easier. However... The "however" is that we are not defining a *new* function but *re*defining an *existing* function, an often used and extremely general existing function. That is all but guaranteed to produce a wild variety of bugs. In fact let us assume, for the sake of argument, that Stephen and you are 100% right. That would mean that any correct existing code, using the present Emacs split-string with a non-nil SEPARATORS, checks for empty matches at the beginning and end and adds any such matches to the split-string output to correct the "bug" in the present split-string. After Stephen's change, any empty match at the beginning and end of the string will produce not one, but two empty strings. Sincerely, Luc.