From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Luc Teirlinck Newsgroups: gmane.emacs.devel,gmane.emacs.xemacs.design Subject: Re: Rationale for split-string? Date: Mon, 21 Apr 2003 16:11:21 -0500 (CDT) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200304212111.h3LLBLK11879@eel.dms.auburn.edu> References: <87brz57at2.fsf@tleepslib.sk.tsukuba.ac.jp> <200304171744.h3HHiJCx009215@rum.cs.yale.edu> <87adem27ey.fsf@tleepslib.sk.tsukuba.ac.jp> <87ist8yv4n.fsf@tleepslib.sk.tsukuba.ac.jp> NNTP-Posting-Host: main.gmane.org X-Trace: main.gmane.org 1050959563 23168 80.91.224.249 (21 Apr 2003 21:12:43 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Apr 2003 21:12:43 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Mon Apr 21 23:12:41 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 197iaf-00061P-00 for ; Mon, 21 Apr 2003 23:12:41 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 197ifb-00052P-00 for ; Mon, 21 Apr 2003 23:17:48 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 197ia8-00020g-01 for emacs-devel@quimby.gnus.org; Mon, 21 Apr 2003 17:12:08 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 197iZe-0001W3-00 for emacs-devel@gnu.org; Mon, 21 Apr 2003 17:11:38 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 197iZU-0000xt-00 for emacs-devel@gnu.org; Mon, 21 Apr 2003 17:11:29 -0400 Original-Received: from manatee.dms.auburn.edu ([131.204.53.104]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 197iZP-0000kG-00; Mon, 21 Apr 2003 17:11:23 -0400 Original-Received: from eel.dms.auburn.edu (eel.dms.auburn.edu [131.204.53.108]) h3LLBKoc026322; Mon, 21 Apr 2003 16:11:20 -0500 (CDT) Original-Received: (from teirllm@localhost) by eel.dms.auburn.edu (8.11.6+Sun/8.11.6) id h3LLBLK11879; Mon, 21 Apr 2003 16:11:21 -0500 (CDT) X-Authentication-Warning: eel.dms.auburn.edu: teirllm set sender to teirllm@dms.auburn.edu using -f Original-To: stephen@xemacs.org In-reply-to: <87ist8yv4n.fsf@tleepslib.sk.tsukuba.ac.jp> (stephen@xemacs.org) Original-cc: xemacs-design@xemacs.org Original-cc: rms@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:13330 gmane.emacs.xemacs.design:2087 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13330 Stephen Turnbull wrote: How about: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; one function, three arguments (defun split-string (string &optional separators omit-nulls) "Splits STRING into substrings bounded by matches for SEPARATORS. The beginning and end of STRING, and each match for SEPARATORS, are splitting points. The substrings between the splitting points are collected in a list, which is returned. (The substrings matching SEPARATORS are removed.) If SEPARATORS is nil, it defaults to \"[ \f\t\n\r\v]+\". If OMIT-NULLs is t, zero-length substrings are omitted from the list (so that for the default value of SEPARATORS leading and trailing whitespace are trimmed). If nil, all zero-length substrings are retained, which correctly parses CSV format, for example." ;; implementation ) There are two problems with this. First of, all it would break tons of existing Emacs code. Secondly, the defaults for SEPARATORS and for OMIT-NULLs do not match. Thus, the most routine call of (split-string string) would produce nonsensical results in the case of leading or trailing whitespace. Something like (split-string &optional separators keep-nulls) that is, the same as your proposal but with the roles of nil and t reversed would take care of the second objection and also break less existing Emacs code (but probably still enough to worry about). Of course the reduction in broken Emacs code would probably come at the expense of breaking existing XEmacs code. With your proposal, we would have to replace plenty of occurrence of (split-string string) in Emacs with (split-string string nil t). To do that automatically, we would have to change all of them. There is plenty of Elisp code that is not included in either the Emacs or XEmacs distributions, but that might still be important to plenty of people. We can not change that code. Code compatible between different Emacs versions would have to become more complex. The reverse version of your proposal would eliminate this part of the problem, but probably produce a similar problem for XEmacs. With the reverse proposal above, we would not have to worry about Emacs calls to split-string with the default-value for SEPARATORS, but one still would have to go through all occurrences of split-string with non-default values of SEPARATORS, at the very least in all .el files in the Lisp directory and all its subdirectories, and very carefully check which ones the change would break and fix all those. (Personally I do not have the time to do that.) Even if somebody finds the time to do all of this, we can not check and fix Elisp code not included in the Emacs or XEmacs distributions. The point of my proposal (possible values "all","none" and "edges" for omit-nulls with nil being equivalent with "edges" in Emacs and with "none" in XEmacs) was to avoid breaking any existing Emacs or XEmacs code while still making it trivial to use split-string in a way that works identically in Emacs and XEmacs. Again, in that proposal, only "edges" as an additional value for omit-nulls is necessary to avoid breaking existing Emacs code. I only mentioned "beginning" and "end" as luxury possibilities. I know of software packages that use the "end" version and the "end" version actually does make a lot of sense in plenty of situations, like splitting a file or buffer into lines, where a leading newline does represent an empty line, but a trailing one does not represent an additional empty line following it. The "end" (as well as the "beginning") behavior is, however, trivial to obtain from the "none" behavior, so that it would be a luxury. ("end" would be a nice luxury, "beginning" would probably be a "luxury luxury" for symmetry with "end".) Sincerely, Luc.