all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Bug in some calls to split-string.
@ 2013-07-19 19:09 Richard Stallman
  2013-07-20  7:34 ` Stephen J. Turnbull
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Stallman @ 2013-07-19 19:09 UTC (permalink / raw
  To: emacs-devel

        [ To any NSA and FBI agents reading my email: please consider
        [ whether defending the US Constitution against all enemies,
        [ foreign or domestic, requires you to follow Snowden's example.

I discovered a call to split-string that looked like this:

  (split-string recipients "[ \t\n]*,[ \t\n]*")

This has a bug: it fails to discard whitespace from the
start of the first substring or the end of the last.

I would guess that there are many such bugs.

To provide a clean and easy way to fix this, I have implemented a new
argument TRIM in split-string.  Please take a look.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Bug in some calls to split-string.
  2013-07-19 19:09 Bug in some calls to split-string Richard Stallman
@ 2013-07-20  7:34 ` Stephen J. Turnbull
  2013-07-20  7:56   ` Jambunathan K
  2013-07-20 13:35   ` Richard Stallman
  0 siblings, 2 replies; 5+ messages in thread
From: Stephen J. Turnbull @ 2013-07-20  7:34 UTC (permalink / raw
  To: rms; +Cc: emacs-devel

Richard Stallman writes:

 > I discovered a call to split-string that looked like this:
 > 
 >   (split-string recipients "[ \t\n]*,[ \t\n]*")
 > 
 > This has a bug: it fails to discard whitespace from the
 > start of the first substring or the end of the last.

Bugginess depends on whether you can rely on the caller to deliver a
trimmed string.

 > I would guess that there are many such bugs.

Using `split-string' to "parse" mail headers is already a quick hack.
If a user wrote the header, even the presence of the required commas
is questionable (people often expect line breaks to separate
addresses), but `(split-string recipients ",")' followed by further
processing as necessary is probably OK most of the time.

 > To provide a clean and easy way to fix this, I have implemented a new
 > argument TRIM in split-string.  Please take a look.

Trimming leading and trailing whitespace is a generally useful
function.  Why not[1]

    (defun trim-string (s &optional extract)
      (setq extract (or extract "^\\s-*\\<\\(.*\\)\\>\\s-*$"))
      (save-match-data
        (string-match extract s)
          ;; This may be an attractive nuisance, but is needed to keep
          ;; expressions that match on boundaries acceptably simple AFAICS:
          ;; eg, \< doesn't match anywhere in a blank or empty string.
	  (or (match-string 1 s) "")))

and the intent of the split-string expr above can be implemented by
either

   (split-string (trim-string recipients) "[ \t\n]*,[ \t\n]*")

(probably more efficient) or

   (mapcar #'trim-string (split-string recipients ","))

(the semantics currently implemented).

If you decide to add TRIM, is this part of the docstring:

    If you want to trim whitespace from the substrings, the reliably
    correct way is using TRIM.  Making SEPARATORS match that
    whitespace gives incorrect results when there is whitespace at the
    start or end of STRING.  If you see such calls to `split-string',
    please fix them.

appropriate?  Shouldn't it be moved to the Lispref?


Footnotes: 
[1]  Specialization to the case where leading and trailing strings
match the same regexp may be equivalent.  Here is that API using the
same algorithm:

    (defun trim-string (s &optional trim)
      (setq trim (or trim "\\s-*\\<\\|\\>\\s-*")
      (let ((extract (concat "^" trim "\\(.*\\)" trim "$"))
        (save-match-data
          (string-match extract s)
	    (or (match-string 1 s) "")))

which works for the default case, but may not be quite as general for
arbitrary EXTRACT regexps.

Why the generality?  Eg (using the EXTRACT-style API):

(trim-string s (concat "^"                         ; beginning of target
                       "\\s-*/?\\* "               ; C block comment leader
                       "\\(\\<.*\\>\\)?"           ; extract content
                       "\\s-*\\(?:\\*/?\\s-*\\)?"  ; C block comment trailer
                       "$"))                       ; end of target

I think this can also be expressed using TRIM (just alternate the C
block comment leader and trailer).



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug in some calls to split-string.
  2013-07-20  7:34 ` Stephen J. Turnbull
@ 2013-07-20  7:56   ` Jambunathan K
  2013-07-20  8:36     ` Stephen J. Turnbull
  2013-07-20 13:35   ` Richard Stallman
  1 sibling, 1 reply; 5+ messages in thread
From: Jambunathan K @ 2013-07-20  7:56 UTC (permalink / raw
  To: Stephen J. Turnbull; +Cc: rms, emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> Trimming leading and trailing whitespace is a generally useful
> function.

Org-mode's usage of `org-trim' gives this proposal 100+ upvotes.

    (defun org-trim (s)
      "Remove whitespace at beginning and end of string."
      (if (string-match "\\`[ \t\n\r]+" s) (setq s (replace-match "" t t s)))
      (if (string-match "[ \t\n\r]+\\'" s) (setq s (replace-match "" t t s)))
      s)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug in some calls to split-string.
  2013-07-20  7:56   ` Jambunathan K
@ 2013-07-20  8:36     ` Stephen J. Turnbull
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen J. Turnbull @ 2013-07-20  8:36 UTC (permalink / raw
  To: Jambunathan K; +Cc: rms, emacs-devel

Jambunathan K writes:
 > "Stephen J. Turnbull" <stephen@xemacs.org> writes:
 > 
 > > Trimming leading and trailing whitespace is a generally useful
 > > function.
 > 
 > Org-mode's usage of `org-trim' gives this proposal 100+ upvotes.
 > 
 >     (defun org-trim (s)
 >       "Remove whitespace at beginning and end of string."
 >       (if (string-match "\\`[ \t\n\r]+" s) (setq s (replace-match "" t t s)))
 >       (if (string-match "[ \t\n\r]+\\'" s) (setq s (replace-match "" t t s)))
 >       s)

Gah.  Of course I should have used \` and \'.

org-trim probably ought to use \s- instead of the explicit character
class.  I would at least add NBSP (U+00A0) to that class.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug in some calls to split-string.
  2013-07-20  7:34 ` Stephen J. Turnbull
  2013-07-20  7:56   ` Jambunathan K
@ 2013-07-20 13:35   ` Richard Stallman
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Stallman @ 2013-07-20 13:35 UTC (permalink / raw
  To: Stephen J. Turnbull; +Cc: emacs-devel

        [ To any NSA and FBI agents reading my email: please consider
        [ whether defending the US Constitution against all enemies,
        [ foreign or domestic, requires you to follow Snowden's example.

I have nothing against adding trim-string, but it needs
to take an argument saying what to trim, if it is to be general
enough to work with all the uses of split-string.

Also,

   (split-string recipients "," nil "[ \t\n]*")

is cleaner than

   (split-string (trim-string recipients "[ \t\n]*") "[ \t\n]*,[ \t\n]*")

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-07-20 13:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-19 19:09 Bug in some calls to split-string Richard Stallman
2013-07-20  7:34 ` Stephen J. Turnbull
2013-07-20  7:56   ` Jambunathan K
2013-07-20  8:36     ` Stephen J. Turnbull
2013-07-20 13:35   ` Richard Stallman

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.