all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Whitespace search and regex.c
@ 2004-11-22 17:24 Stefan Monnier
  2004-11-25  2:21 ` Richard Stallman
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2004-11-22 17:24 UTC (permalink / raw)



What is this recent change to regex.c w.r.t whitespace search all about?
This is really ugly.

As best as I can tell, this is to avoid the problem where
(replace-regexp-in-string " " "\\(?:\s-+\\)" ...) does not give the right
result because the " " could be inside brackets.

I think that changing regex.c for this one special case is a mistake.
Not only because it dirties up the regex.[ch] abstraction even further by
adding yet a bit more Emacs specific code in regex.[ch].

After all this problem manifests itself at a few other places (such as
regexp-opt-depth) as well.  So I think a more general solution would be much
more useful and cleaner.  E.g. a function (parse-partial-regex REGEXP POS)
which would return a value indicating whether POS is within brackets or not.


        Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Whitespace search and regex.c
  2004-11-22 17:24 Whitespace search and regex.c Stefan Monnier
@ 2004-11-25  2:21 ` Richard Stallman
  2004-11-25 15:20   ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Stallman @ 2004-11-25  2:21 UTC (permalink / raw)
  Cc: emacs-devel

    What is this recent change to regex.c w.r.t whitespace search all about?
    This is really ugly.

    As best as I can tell, this is to avoid the problem where
    (replace-regexp-in-string " " "\\(?:\s-+\\)" ...) does not give the right
    result because the " " could be inside brackets.

It has nothing to do with replace-regexp-in-string, which doesn't use
this feature.  It is for the sake of user-level search features that
want a series of SPCs to stand for some broader kind of whitespace.

    Not only because it dirties up the regex.[ch] abstraction even further by
    adding yet a bit more Emacs specific code in regex.[ch].

The Emacs version of regex.c is forever forked from the others.  A new
version of the regex code is being adopted in all our other packages,
but it doesn't handle a split buffer, so Emacs can't use it.

    After all this problem manifests itself at a few other places (such as
    regexp-opt-depth) as well.

I don't follow how this relates to regexp-opt-depth.
Would you please spell that out?

    E.g. a function (parse-partial-regex REGEXP POS)
    which would return a value indicating whether POS is within brackets or not.

That would be helpful for making C-q SRC in I-search DTRT both
inside and outside brackets.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Whitespace search and regex.c
  2004-11-25  2:21 ` Richard Stallman
@ 2004-11-25 15:20   ` Stefan Monnier
  2004-11-26 21:03     ` Richard Stallman
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2004-11-25 15:20 UTC (permalink / raw)
  Cc: emacs-devel

>     What is this recent change to regex.c w.r.t whitespace search all about?
>     This is really ugly.

>     As best as I can tell, this is to avoid the problem where
>     (replace-regexp-in-string " " "\\(?:\s-+\\)" ...) does not give the right
>     result because the " " could be inside brackets.

> It has nothing to do with replace-regexp-in-string, which doesn't use
> this feature.
I didn't mean to say that it was used by replace-regexp-in-string but that
it was used for those cases where you want a regex generated by
systematically replacing " " with something else (such as "\\(?:\s-+\\)").

In those cases, the obvious way to do it (with replace-regexp-in-string or
some piece of code that ends up doing something similar) suffers from the
fact that it will replace " " even if it appears inside brackets delimiting
a char-range.

> It is for the sake of user-level search features that
> want a series of SPCs to stand for some broader kind of whitespace.

I'm afraid that doesn't tell me really what it is for.  I.e. why is it
implemented this way rather than some other way?  What was the precise goal?

E.g. if you told me to implement a "user-level search features that
want a series of SPCs to stand for some broader kind of whitespace" it'd
never occur to me to fiddle with regex.[ch].  Instead, I'd add a piece of
elisp code which replaces every SPC (or sequence of SPC) in a regex with
some other regex.  E.g. I'd use something like (replace-regexp-in-string " "
"\\(?:\s-+\\)" ...).  Now maybe in order to correctly do the replacement in
the presence of brackets, I'd probably add a function like
(parse-partial-regex REGEXP POS), potentially (tho probably not at first)
implemented in regex.[ch].

>     After all this problem manifests itself at a few other places (such as
>     regexp-opt-depth) as well.
> I don't follow how this relates to regexp-opt-depth.
> Would you please spell that out?

Regexp-opt-depth has to count the number of occurences of "\\(" in a regexp,
but it should be careful not to count those occurences that appear within
brackets.

>     E.g. a function (parse-partial-regex REGEXP POS)
>     which would return a value indicating whether POS is within brackets or not.

> That would be helpful for making C-q SPC in I-search DTRT both
> inside and outside brackets.

Yes, and it would also help the previous code (before your changes) make SPC
DTRT both inside and outside brackets.  I thought your change was trying to
solve exactly this problem, and that it ends up just pushing it from SPC to
C-q SPC.


        Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Whitespace search and regex.c
  2004-11-25 15:20   ` Stefan Monnier
@ 2004-11-26 21:03     ` Richard Stallman
  2004-11-26 22:12       ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Stallman @ 2004-11-26 21:03 UTC (permalink / raw)
  Cc: emacs-devel

    > It is for the sake of user-level search features that
    > want a series of SPCs to stand for some broader kind of whitespace.

    I'm afraid that doesn't tell me really what it is for.  I.e. why is it
    implemented this way rather than some other way?  What was the precise goal?

This was the easiest way I could find to do it reliably.
It avoids the need to write other code to parse a regexp.
I did not want to do that; I figured it would be a pain.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Whitespace search and regex.c
  2004-11-26 21:03     ` Richard Stallman
@ 2004-11-26 22:12       ` Stefan Monnier
  2004-11-29  6:12         ` Richard Stallman
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2004-11-26 22:12 UTC (permalink / raw)
  Cc: emacs-devel

>> It is for the sake of user-level search features that
>> want a series of SPCs to stand for some broader kind of whitespace.

>     I'm afraid that doesn't tell me really what it is for.  I.e. why is it
>     implemented this way rather than some other way?  What was the
>     precise goal?

> This was the easiest way I could find to do it reliably.
> It avoids the need to write other code to parse a regexp.
> I did not want to do that; I figured it would be a pain.

Well, another easy way goes as follows:

      (defun in-brackets-p (regex pos)
        "Return non-nil if POS is within brackets in REGEX."
        (condition-case err
            (progn
              (string-match (substring regex 0 pos) "")
              nil)
          (error (equal (cadr err) "Unmatched [ or [^"))))


-- Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Whitespace search and regex.c
  2004-11-26 22:12       ` Stefan Monnier
@ 2004-11-29  6:12         ` Richard Stallman
  2004-12-05 15:33           ` Juri Linkov
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Stallman @ 2004-11-29  6:12 UTC (permalink / raw)
  Cc: emacs-devel

With your idea, I can solve the remaining problem in isearch.el.
This change seems unproblematical.

*** isearch.el	12 Oct 2004 04:45:09 -0400	1.242
--- isearch.el	28 Nov 2004 21:26:36 -0500	
***************
*** 109,120 ****
    :type 'boolean
    :group 'isearch)
  
! (defcustom search-whitespace-regexp "\\(?:\\s-+\\)"
    "*If non-nil, regular expression to match a sequence of whitespace chars.
  This applies to regular expression incremental search.
! You might want to use something like \"\\\\(?:[ \\t\\r\\n]+\\\\)\" instead.
! In the Customization buffer, that is `\\(?:[' followed by a space,
! a tab, a carriage return (control-M), a newline, and `]+\\)'."
    :type 'regexp
    :group 'isearch)
  
--- 109,122 ----
    :type 'boolean
    :group 'isearch)
  
! (defcustom search-whitespace-regexp "\\s-+"
    "*If non-nil, regular expression to match a sequence of whitespace chars.
  This applies to regular expression incremental search.
! When you put a space or spaces in the incremental regexp, it stands for
! this, unless it is inside of a regexp construct such as [...] or *, + or ?.
! You might want to use something like \"[ \\t\\r\\n]+\" instead.
! In the Customization buffer, that is `[' followed by a space,
! a tab, a carriage return (control-M), a newline, and `]+'."
    :type 'regexp
    :group 'isearch)
  
***************
*** 260,267 ****
      (define-key map "\r" 'isearch-exit)
      (define-key map "\C-j" 'isearch-printing-char)
      (define-key map "\t" 'isearch-printing-char)
-     (define-key map " " 'isearch-whitespace-chars)
-     (define-key map [?\S-\ ] 'isearch-whitespace-chars)
  
      (define-key map    "\C-w" 'isearch-yank-word-or-char)
      (define-key map "\M-\C-w" 'isearch-del-char)
--- 262,267 ----
***************
*** 485,491 ****
  Do incremental search forward for regular expression.
  With a prefix argument, do a regular string search instead.
  Like ordinary incremental search except that your input
! is treated as a regexp.  See \\[isearch-forward] for more info."
    (interactive "P\np")
    (isearch-mode t (null not-regexp) nil (not no-recursive-edit)))
  
--- 485,496 ----
  Do incremental search forward for regular expression.
  With a prefix argument, do a regular string search instead.
  Like ordinary incremental search except that your input
! is treated as a regexp.  See \\[isearch-forward] for more info.
! 
! In regexp incremental searches, a space or spaces normally matches
! any whitespace (the variable `search-whitespace-regexp' controls
! precisely what that means).  If you want to search for a literal space
! and nothing else, enter `[ ]'."
    (interactive "P\np")
    (isearch-mode t (null not-regexp) nil (not no-recursive-edit)))
  
***************
*** 1676,1686 ****
      ;; Assume character codes 0200 - 0377 stand for characters in some
      ;; single-byte character set, and convert them to Emacs
      ;; characters.
!     (and enable-multibyte-characters
! 	 (>= char ?\200)
! 	 (<= char ?\377)
! 	 (setq char (unibyte-char-to-multibyte char)))
!     (isearch-process-search-char char)))
  
  (defun isearch-return-char ()
    "Convert return into newline for incremental search.
--- 1681,1699 ----
      ;; Assume character codes 0200 - 0377 stand for characters in some
      ;; single-byte character set, and convert them to Emacs
      ;; characters.
!     (if (and isearch-regexp (= char ?\ ))
! 	(if (condition-case err
! 		(progn
! 		  (string-match isearch-string "")
! 		  nil)
! 	      (error (equal (cadr err) "Unmatched [ or [^")))
! 	    (isearch-process-search-char char)
! 	  (isearch-process-search-string "[ ]" " "))
!       (and enable-multibyte-characters
! 	   (>= char ?\200)
! 	   (<= char ?\377)
! 	   (setq char (unibyte-char-to-multibyte char)))
!       (isearch-process-search-char char))))
  
  (defun isearch-return-char ()
    "Convert return into newline for incremental search.
***************
*** 1704,1725 ****
  	  (isearch-process-search-multibyte-characters char)
  	(isearch-process-search-char char)))))
  
- (defun isearch-whitespace-chars ()
-   "Match all whitespace chars, if in regexp mode.
- If you want to search for just a space, type \\<isearch-mode-map>\\[isearch-quote-char] SPC."
-   (interactive)
-   (if isearch-regexp
-       (if (and search-whitespace-regexp (not isearch-within-brackets)
- 	       (not isearch-invalid-regexp))
- 	  (isearch-process-search-string search-whitespace-regexp " ")
- 	(isearch-printing-char))
-     (progn
-       ;; This way of doing word search doesn't correctly extend current search.
-       ;;      (setq isearch-word t)
-       ;;      (setq isearch-adjusted t)
-       ;;      (goto-char isearch-barrier)
-       (isearch-printing-char))))
- 
  (defun isearch-process-search-char (char)
    ;; Append the char to the search string, update the message and re-search.
    (isearch-process-search-string
--- 1717,1722 ----
***************
*** 1960,1965 ****
--- 1957,1963 ----
        (let ((inhibit-point-motion-hooks search-invisible)
  	    (inhibit-quit nil)
  	    (case-fold-search isearch-case-fold-search)
+ 	    (search-spaces-regexp search-whitespace-regexp)
  	    (retry t))
  	(if isearch-regexp (setq isearch-invalid-regexp nil))
  	(setq isearch-within-brackets nil)
***************
*** 2373,2379 ****
  (defun isearch-lazy-highlight-search ()
    "Search ahead for the next or previous match, for lazy highlighting.
  Attempt to do the search exactly the way the pending isearch would."
!   (let ((case-fold-search isearch-case-fold-search))
      (funcall (isearch-search-fun)
               isearch-string
               (if isearch-forward
--- 2371,2378 ----
  (defun isearch-lazy-highlight-search ()
    "Search ahead for the next or previous match, for lazy highlighting.
  Attempt to do the search exactly the way the pending isearch would."
!   (let ((case-fold-search isearch-case-fold-search)
! 	(search-spaces-regexp search-whitespace-regexp))
      (funcall (isearch-search-fun)
               isearch-string
               (if isearch-forward

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Whitespace search and regex.c
  2004-11-29  6:12         ` Richard Stallman
@ 2004-12-05 15:33           ` Juri Linkov
  0 siblings, 0 replies; 7+ messages in thread
From: Juri Linkov @ 2004-12-05 15:33 UTC (permalink / raw)
  Cc: monnier, emacs-devel

I have no strong opinion on whether changing regex.c was a good thing
or not (after all, Perl already entered on the same path where its regexp
engine is modified every time someone fells a need to add a new feature.
BTW, setting `search-spaces-regexp' to the empty string would allow
to improve Emacs regexp readability similar to the /x modifier in Perl
regexps.  I won't elaborate on how useful it is, but I always wished
that Emacs regexps would use features of Perl regexps).

Anyhow, I want to note that there is a bug in the current
implementation of `search-spaces-regexp' feature.  The space
at the end of a regexp is interpreted literally, not as a whitespace
specified by `search-spaces-regexp'.  Whereas the space in the
middle or at the beginning of a regexp is interpreted correctly
(i.e. as a special whitespace character).

Test case:

(let ((search-spaces-regexp "\\s-+")) (re-search-forward "a b"))
finds the first line in the test text below (CORRECT).

(let ((search-spaces-regexp "\\s-+")) (re-search-forward "a "))
finds the second line in the test text below (INCORRECT, it should find
the first line).

Test text:
a	b   (has TAB between `a' and `b')
a b         (has SPC between `a' and `b')

-- 
Juri Linkov
http://www.jurta.org/emacs/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-12-05 15:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-22 17:24 Whitespace search and regex.c Stefan Monnier
2004-11-25  2:21 ` Richard Stallman
2004-11-25 15:20   ` Stefan Monnier
2004-11-26 21:03     ` Richard Stallman
2004-11-26 22:12       ` Stefan Monnier
2004-11-29  6:12         ` Richard Stallman
2004-12-05 15:33           ` Juri Linkov

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.