unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: rms@gnu.org
Cc: emacs-devel@gnu.org
Subject: Bug in some calls to split-string.
Date: Sat, 20 Jul 2013 16:34:59 +0900	[thread overview]
Message-ID: <87hafpr8f0.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <E1V0G3z-0006Tb-KN@fencepost.gnu.org>

Richard Stallman writes:

 > I discovered a call to split-string that looked like this:
 > 
 >   (split-string recipients "[ \t\n]*,[ \t\n]*")
 > 
 > This has a bug: it fails to discard whitespace from the
 > start of the first substring or the end of the last.

Bugginess depends on whether you can rely on the caller to deliver a
trimmed string.

 > I would guess that there are many such bugs.

Using `split-string' to "parse" mail headers is already a quick hack.
If a user wrote the header, even the presence of the required commas
is questionable (people often expect line breaks to separate
addresses), but `(split-string recipients ",")' followed by further
processing as necessary is probably OK most of the time.

 > To provide a clean and easy way to fix this, I have implemented a new
 > argument TRIM in split-string.  Please take a look.

Trimming leading and trailing whitespace is a generally useful
function.  Why not[1]

    (defun trim-string (s &optional extract)
      (setq extract (or extract "^\\s-*\\<\\(.*\\)\\>\\s-*$"))
      (save-match-data
        (string-match extract s)
          ;; This may be an attractive nuisance, but is needed to keep
          ;; expressions that match on boundaries acceptably simple AFAICS:
          ;; eg, \< doesn't match anywhere in a blank or empty string.
	  (or (match-string 1 s) "")))

and the intent of the split-string expr above can be implemented by
either

   (split-string (trim-string recipients) "[ \t\n]*,[ \t\n]*")

(probably more efficient) or

   (mapcar #'trim-string (split-string recipients ","))

(the semantics currently implemented).

If you decide to add TRIM, is this part of the docstring:

    If you want to trim whitespace from the substrings, the reliably
    correct way is using TRIM.  Making SEPARATORS match that
    whitespace gives incorrect results when there is whitespace at the
    start or end of STRING.  If you see such calls to `split-string',
    please fix them.

appropriate?  Shouldn't it be moved to the Lispref?


Footnotes: 
[1]  Specialization to the case where leading and trailing strings
match the same regexp may be equivalent.  Here is that API using the
same algorithm:

    (defun trim-string (s &optional trim)
      (setq trim (or trim "\\s-*\\<\\|\\>\\s-*")
      (let ((extract (concat "^" trim "\\(.*\\)" trim "$"))
        (save-match-data
          (string-match extract s)
	    (or (match-string 1 s) "")))

which works for the default case, but may not be quite as general for
arbitrary EXTRACT regexps.

Why the generality?  Eg (using the EXTRACT-style API):

(trim-string s (concat "^"                         ; beginning of target
                       "\\s-*/?\\* "               ; C block comment leader
                       "\\(\\<.*\\>\\)?"           ; extract content
                       "\\s-*\\(?:\\*/?\\s-*\\)?"  ; C block comment trailer
                       "$"))                       ; end of target

I think this can also be expressed using TRIM (just alternate the C
block comment leader and trailer).



  reply	other threads:[~2013-07-20  7:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-19 19:09 Bug in some calls to split-string Richard Stallman
2013-07-20  7:34 ` Stephen J. Turnbull [this message]
2013-07-20  7:56   ` Jambunathan K
2013-07-20  8:36     ` Stephen J. Turnbull
2013-07-20 13:35   ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87hafpr8f0.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=stephen@xemacs.org \
    --cc=emacs-devel@gnu.org \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).