From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Bug in some calls to split-string. Date: Sat, 20 Jul 2013 16:34:59 +0900 Message-ID: <87hafpr8f0.fsf@uwakimon.sk.tsukuba.ac.jp> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1374305717 7649 80.91.229.3 (20 Jul 2013 07:35:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Jul 2013 07:35:17 +0000 (UTC) Cc: emacs-devel@gnu.org To: rms@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jul 20 09:35:17 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1V0RhF-0001qT-Ie for ged-emacs-devel@m.gmane.org; Sat, 20 Jul 2013 09:35:17 +0200 Original-Received: from localhost ([::1]:54112 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V0RhE-0004fD-PT for ged-emacs-devel@m.gmane.org; Sat, 20 Jul 2013 03:35:16 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40118) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V0RhB-0004ex-Ce for emacs-devel@gnu.org; Sat, 20 Jul 2013 03:35:14 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V0Rh7-0005kn-Np for emacs-devel@gnu.org; Sat, 20 Jul 2013 03:35:13 -0400 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:54307) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V0Rh7-0005ED-58; Sat, 20 Jul 2013 03:35:09 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id DE1A2970938; Sat, 20 Jul 2013 16:34:59 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id A0EBB15805A; Sat, 20 Jul 2013 16:34:59 +0900 (JST) In-Reply-To: X-Mailer: VM undefined under 21.5 (beta32) "habanero" b0d40183ac79 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:162028 Archived-At: Richard Stallman writes: > I discovered a call to split-string that looked like this: > > (split-string recipients "[ \t\n]*,[ \t\n]*") > > This has a bug: it fails to discard whitespace from the > start of the first substring or the end of the last. Bugginess depends on whether you can rely on the caller to deliver a trimmed string. > I would guess that there are many such bugs. Using `split-string' to "parse" mail headers is already a quick hack. If a user wrote the header, even the presence of the required commas is questionable (people often expect line breaks to separate addresses), but `(split-string recipients ",")' followed by further processing as necessary is probably OK most of the time. > To provide a clean and easy way to fix this, I have implemented a new > argument TRIM in split-string. Please take a look. Trimming leading and trailing whitespace is a generally useful function. Why not[1] (defun trim-string (s &optional extract) (setq extract (or extract "^\\s-*\\<\\(.*\\)\\>\\s-*$")) (save-match-data (string-match extract s) ;; This may be an attractive nuisance, but is needed to keep ;; expressions that match on boundaries acceptably simple AFAICS: ;; eg, \< doesn't match anywhere in a blank or empty string. (or (match-string 1 s) ""))) and the intent of the split-string expr above can be implemented by either (split-string (trim-string recipients) "[ \t\n]*,[ \t\n]*") (probably more efficient) or (mapcar #'trim-string (split-string recipients ",")) (the semantics currently implemented). If you decide to add TRIM, is this part of the docstring: If you want to trim whitespace from the substrings, the reliably correct way is using TRIM. Making SEPARATORS match that whitespace gives incorrect results when there is whitespace at the start or end of STRING. If you see such calls to `split-string', please fix them. appropriate? Shouldn't it be moved to the Lispref? Footnotes: [1] Specialization to the case where leading and trailing strings match the same regexp may be equivalent. Here is that API using the same algorithm: (defun trim-string (s &optional trim) (setq trim (or trim "\\s-*\\<\\|\\>\\s-*") (let ((extract (concat "^" trim "\\(.*\\)" trim "$")) (save-match-data (string-match extract s) (or (match-string 1 s) ""))) which works for the default case, but may not be quite as general for arbitrary EXTRACT regexps. Why the generality? Eg (using the EXTRACT-style API): (trim-string s (concat "^" ; beginning of target "\\s-*/?\\* " ; C block comment leader "\\(\\<.*\\>\\)?" ; extract content "\\s-*\\(?:\\*/?\\s-*\\)?" ; C block comment trailer "$")) ; end of target I think this can also be expressed using TRIM (just alternate the C block comment leader and trailer).