* Whitespace search and regex.c @ 2004-11-22 17:24 Stefan Monnier 2004-11-25 2:21 ` Richard Stallman 0 siblings, 1 reply; 7+ messages in thread From: Stefan Monnier @ 2004-11-22 17:24 UTC (permalink / raw) What is this recent change to regex.c w.r.t whitespace search all about? This is really ugly. As best as I can tell, this is to avoid the problem where (replace-regexp-in-string " " "\\(?:\s-+\\)" ...) does not give the right result because the " " could be inside brackets. I think that changing regex.c for this one special case is a mistake. Not only because it dirties up the regex.[ch] abstraction even further by adding yet a bit more Emacs specific code in regex.[ch]. After all this problem manifests itself at a few other places (such as regexp-opt-depth) as well. So I think a more general solution would be much more useful and cleaner. E.g. a function (parse-partial-regex REGEXP POS) which would return a value indicating whether POS is within brackets or not. Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Whitespace search and regex.c 2004-11-22 17:24 Whitespace search and regex.c Stefan Monnier @ 2004-11-25 2:21 ` Richard Stallman 2004-11-25 15:20 ` Stefan Monnier 0 siblings, 1 reply; 7+ messages in thread From: Richard Stallman @ 2004-11-25 2:21 UTC (permalink / raw) Cc: emacs-devel What is this recent change to regex.c w.r.t whitespace search all about? This is really ugly. As best as I can tell, this is to avoid the problem where (replace-regexp-in-string " " "\\(?:\s-+\\)" ...) does not give the right result because the " " could be inside brackets. It has nothing to do with replace-regexp-in-string, which doesn't use this feature. It is for the sake of user-level search features that want a series of SPCs to stand for some broader kind of whitespace. Not only because it dirties up the regex.[ch] abstraction even further by adding yet a bit more Emacs specific code in regex.[ch]. The Emacs version of regex.c is forever forked from the others. A new version of the regex code is being adopted in all our other packages, but it doesn't handle a split buffer, so Emacs can't use it. After all this problem manifests itself at a few other places (such as regexp-opt-depth) as well. I don't follow how this relates to regexp-opt-depth. Would you please spell that out? E.g. a function (parse-partial-regex REGEXP POS) which would return a value indicating whether POS is within brackets or not. That would be helpful for making C-q SRC in I-search DTRT both inside and outside brackets. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Whitespace search and regex.c 2004-11-25 2:21 ` Richard Stallman @ 2004-11-25 15:20 ` Stefan Monnier 2004-11-26 21:03 ` Richard Stallman 0 siblings, 1 reply; 7+ messages in thread From: Stefan Monnier @ 2004-11-25 15:20 UTC (permalink / raw) Cc: emacs-devel > What is this recent change to regex.c w.r.t whitespace search all about? > This is really ugly. > As best as I can tell, this is to avoid the problem where > (replace-regexp-in-string " " "\\(?:\s-+\\)" ...) does not give the right > result because the " " could be inside brackets. > It has nothing to do with replace-regexp-in-string, which doesn't use > this feature. I didn't mean to say that it was used by replace-regexp-in-string but that it was used for those cases where you want a regex generated by systematically replacing " " with something else (such as "\\(?:\s-+\\)"). In those cases, the obvious way to do it (with replace-regexp-in-string or some piece of code that ends up doing something similar) suffers from the fact that it will replace " " even if it appears inside brackets delimiting a char-range. > It is for the sake of user-level search features that > want a series of SPCs to stand for some broader kind of whitespace. I'm afraid that doesn't tell me really what it is for. I.e. why is it implemented this way rather than some other way? What was the precise goal? E.g. if you told me to implement a "user-level search features that want a series of SPCs to stand for some broader kind of whitespace" it'd never occur to me to fiddle with regex.[ch]. Instead, I'd add a piece of elisp code which replaces every SPC (or sequence of SPC) in a regex with some other regex. E.g. I'd use something like (replace-regexp-in-string " " "\\(?:\s-+\\)" ...). Now maybe in order to correctly do the replacement in the presence of brackets, I'd probably add a function like (parse-partial-regex REGEXP POS), potentially (tho probably not at first) implemented in regex.[ch]. > After all this problem manifests itself at a few other places (such as > regexp-opt-depth) as well. > I don't follow how this relates to regexp-opt-depth. > Would you please spell that out? Regexp-opt-depth has to count the number of occurences of "\\(" in a regexp, but it should be careful not to count those occurences that appear within brackets. > E.g. a function (parse-partial-regex REGEXP POS) > which would return a value indicating whether POS is within brackets or not. > That would be helpful for making C-q SPC in I-search DTRT both > inside and outside brackets. Yes, and it would also help the previous code (before your changes) make SPC DTRT both inside and outside brackets. I thought your change was trying to solve exactly this problem, and that it ends up just pushing it from SPC to C-q SPC. Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Whitespace search and regex.c 2004-11-25 15:20 ` Stefan Monnier @ 2004-11-26 21:03 ` Richard Stallman 2004-11-26 22:12 ` Stefan Monnier 0 siblings, 1 reply; 7+ messages in thread From: Richard Stallman @ 2004-11-26 21:03 UTC (permalink / raw) Cc: emacs-devel > It is for the sake of user-level search features that > want a series of SPCs to stand for some broader kind of whitespace. I'm afraid that doesn't tell me really what it is for. I.e. why is it implemented this way rather than some other way? What was the precise goal? This was the easiest way I could find to do it reliably. It avoids the need to write other code to parse a regexp. I did not want to do that; I figured it would be a pain. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Whitespace search and regex.c 2004-11-26 21:03 ` Richard Stallman @ 2004-11-26 22:12 ` Stefan Monnier 2004-11-29 6:12 ` Richard Stallman 0 siblings, 1 reply; 7+ messages in thread From: Stefan Monnier @ 2004-11-26 22:12 UTC (permalink / raw) Cc: emacs-devel >> It is for the sake of user-level search features that >> want a series of SPCs to stand for some broader kind of whitespace. > I'm afraid that doesn't tell me really what it is for. I.e. why is it > implemented this way rather than some other way? What was the > precise goal? > This was the easiest way I could find to do it reliably. > It avoids the need to write other code to parse a regexp. > I did not want to do that; I figured it would be a pain. Well, another easy way goes as follows: (defun in-brackets-p (regex pos) "Return non-nil if POS is within brackets in REGEX." (condition-case err (progn (string-match (substring regex 0 pos) "") nil) (error (equal (cadr err) "Unmatched [ or [^")))) -- Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Whitespace search and regex.c 2004-11-26 22:12 ` Stefan Monnier @ 2004-11-29 6:12 ` Richard Stallman 2004-12-05 15:33 ` Juri Linkov 0 siblings, 1 reply; 7+ messages in thread From: Richard Stallman @ 2004-11-29 6:12 UTC (permalink / raw) Cc: emacs-devel With your idea, I can solve the remaining problem in isearch.el. This change seems unproblematical. *** isearch.el 12 Oct 2004 04:45:09 -0400 1.242 --- isearch.el 28 Nov 2004 21:26:36 -0500 *************** *** 109,120 **** :type 'boolean :group 'isearch) ! (defcustom search-whitespace-regexp "\\(?:\\s-+\\)" "*If non-nil, regular expression to match a sequence of whitespace chars. This applies to regular expression incremental search. ! You might want to use something like \"\\\\(?:[ \\t\\r\\n]+\\\\)\" instead. ! In the Customization buffer, that is `\\(?:[' followed by a space, ! a tab, a carriage return (control-M), a newline, and `]+\\)'." :type 'regexp :group 'isearch) --- 109,122 ---- :type 'boolean :group 'isearch) ! (defcustom search-whitespace-regexp "\\s-+" "*If non-nil, regular expression to match a sequence of whitespace chars. This applies to regular expression incremental search. ! When you put a space or spaces in the incremental regexp, it stands for ! this, unless it is inside of a regexp construct such as [...] or *, + or ?. ! You might want to use something like \"[ \\t\\r\\n]+\" instead. ! In the Customization buffer, that is `[' followed by a space, ! a tab, a carriage return (control-M), a newline, and `]+'." :type 'regexp :group 'isearch) *************** *** 260,267 **** (define-key map "\r" 'isearch-exit) (define-key map "\C-j" 'isearch-printing-char) (define-key map "\t" 'isearch-printing-char) - (define-key map " " 'isearch-whitespace-chars) - (define-key map [?\S-\ ] 'isearch-whitespace-chars) (define-key map "\C-w" 'isearch-yank-word-or-char) (define-key map "\M-\C-w" 'isearch-del-char) --- 262,267 ---- *************** *** 485,491 **** Do incremental search forward for regular expression. With a prefix argument, do a regular string search instead. Like ordinary incremental search except that your input ! is treated as a regexp. See \\[isearch-forward] for more info." (interactive "P\np") (isearch-mode t (null not-regexp) nil (not no-recursive-edit))) --- 485,496 ---- Do incremental search forward for regular expression. With a prefix argument, do a regular string search instead. Like ordinary incremental search except that your input ! is treated as a regexp. See \\[isearch-forward] for more info. ! ! In regexp incremental searches, a space or spaces normally matches ! any whitespace (the variable `search-whitespace-regexp' controls ! precisely what that means). If you want to search for a literal space ! and nothing else, enter `[ ]'." (interactive "P\np") (isearch-mode t (null not-regexp) nil (not no-recursive-edit))) *************** *** 1676,1686 **** ;; Assume character codes 0200 - 0377 stand for characters in some ;; single-byte character set, and convert them to Emacs ;; characters. ! (and enable-multibyte-characters ! (>= char ?\200) ! (<= char ?\377) ! (setq char (unibyte-char-to-multibyte char))) ! (isearch-process-search-char char))) (defun isearch-return-char () "Convert return into newline for incremental search. --- 1681,1699 ---- ;; Assume character codes 0200 - 0377 stand for characters in some ;; single-byte character set, and convert them to Emacs ;; characters. ! (if (and isearch-regexp (= char ?\ )) ! (if (condition-case err ! (progn ! (string-match isearch-string "") ! nil) ! (error (equal (cadr err) "Unmatched [ or [^"))) ! (isearch-process-search-char char) ! (isearch-process-search-string "[ ]" " ")) ! (and enable-multibyte-characters ! (>= char ?\200) ! (<= char ?\377) ! (setq char (unibyte-char-to-multibyte char))) ! (isearch-process-search-char char)))) (defun isearch-return-char () "Convert return into newline for incremental search. *************** *** 1704,1725 **** (isearch-process-search-multibyte-characters char) (isearch-process-search-char char))))) - (defun isearch-whitespace-chars () - "Match all whitespace chars, if in regexp mode. - If you want to search for just a space, type \\<isearch-mode-map>\\[isearch-quote-char] SPC." - (interactive) - (if isearch-regexp - (if (and search-whitespace-regexp (not isearch-within-brackets) - (not isearch-invalid-regexp)) - (isearch-process-search-string search-whitespace-regexp " ") - (isearch-printing-char)) - (progn - ;; This way of doing word search doesn't correctly extend current search. - ;; (setq isearch-word t) - ;; (setq isearch-adjusted t) - ;; (goto-char isearch-barrier) - (isearch-printing-char)))) - (defun isearch-process-search-char (char) ;; Append the char to the search string, update the message and re-search. (isearch-process-search-string --- 1717,1722 ---- *************** *** 1960,1965 **** --- 1957,1963 ---- (let ((inhibit-point-motion-hooks search-invisible) (inhibit-quit nil) (case-fold-search isearch-case-fold-search) + (search-spaces-regexp search-whitespace-regexp) (retry t)) (if isearch-regexp (setq isearch-invalid-regexp nil)) (setq isearch-within-brackets nil) *************** *** 2373,2379 **** (defun isearch-lazy-highlight-search () "Search ahead for the next or previous match, for lazy highlighting. Attempt to do the search exactly the way the pending isearch would." ! (let ((case-fold-search isearch-case-fold-search)) (funcall (isearch-search-fun) isearch-string (if isearch-forward --- 2371,2378 ---- (defun isearch-lazy-highlight-search () "Search ahead for the next or previous match, for lazy highlighting. Attempt to do the search exactly the way the pending isearch would." ! (let ((case-fold-search isearch-case-fold-search) ! (search-spaces-regexp search-whitespace-regexp)) (funcall (isearch-search-fun) isearch-string (if isearch-forward ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Whitespace search and regex.c 2004-11-29 6:12 ` Richard Stallman @ 2004-12-05 15:33 ` Juri Linkov 0 siblings, 0 replies; 7+ messages in thread From: Juri Linkov @ 2004-12-05 15:33 UTC (permalink / raw) Cc: monnier, emacs-devel I have no strong opinion on whether changing regex.c was a good thing or not (after all, Perl already entered on the same path where its regexp engine is modified every time someone fells a need to add a new feature. BTW, setting `search-spaces-regexp' to the empty string would allow to improve Emacs regexp readability similar to the /x modifier in Perl regexps. I won't elaborate on how useful it is, but I always wished that Emacs regexps would use features of Perl regexps). Anyhow, I want to note that there is a bug in the current implementation of `search-spaces-regexp' feature. The space at the end of a regexp is interpreted literally, not as a whitespace specified by `search-spaces-regexp'. Whereas the space in the middle or at the beginning of a regexp is interpreted correctly (i.e. as a special whitespace character). Test case: (let ((search-spaces-regexp "\\s-+")) (re-search-forward "a b")) finds the first line in the test text below (CORRECT). (let ((search-spaces-regexp "\\s-+")) (re-search-forward "a ")) finds the second line in the test text below (INCORRECT, it should find the first line). Test text: a b (has TAB between `a' and `b') a b (has SPC between `a' and `b') -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-12-05 15:33 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-11-22 17:24 Whitespace search and regex.c Stefan Monnier 2004-11-25 2:21 ` Richard Stallman 2004-11-25 15:20 ` Stefan Monnier 2004-11-26 21:03 ` Richard Stallman 2004-11-26 22:12 ` Stefan Monnier 2004-11-29 6:12 ` Richard Stallman 2004-12-05 15:33 ` Juri Linkov
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).