* regular expressions that match nothing @ 2019-05-14 7:25 philippe schnoebelen 2019-05-14 10:14 ` Mattias Engdegård 0 siblings, 1 reply; 37+ messages in thread From: philippe schnoebelen @ 2019-05-14 7:25 UTC (permalink / raw) To: emacs-devel [-- Attachment #1.1: Type: text/plain, Size: 982 bytes --] I was very happy to see that in v27.0.50 (regexp-opt nil) now properly returns a regular expression that matches nothing, namely a\`. Thanks to whoever fixed that old bug. I was wondering why (regexp-opt nil) uses a\` and not \'a or another option like \=a\= so I did some profiling (see attached code). The different options that I tried have more or less the same response time when one checks, via looking-at, whether the regexp matches at point. But when one searches for a match across a whole buffer, some options behave notably faster than the others. And a\` is not the best, e.g., \=a\= is way faster. Maybe some other solutions would be even faster. Of course this may be dependent on the internals of the specific regexp library at hand. I do not know enough to judge. In fact I believe that a solid regular expression library should provide a specific regular expression that matches nothing with special but easy treatment that guarantees best response time. --phs [-- Attachment #1.2: Type: text/html, Size: 1153 bytes --] [-- Attachment #2: profile-empty-regexp.el --] [-- Type: application/octet-stream, Size: 1542 bytes --] (defun profile-empty-regexps (&optional buffer) "Report some matching times for several regular expressions." (interactive) (unless buffer (setq buffer (find-file-noselect (locate-library "regexp-opt")))) (with-output-to-temp-buffer "Profiling" (princ (profile-one-regexp "a\\`" buffer)) (princ "\n") (princ (profile-one-regexp "\\'a" buffer)) (princ "\n") (princ (profile-one-regexp "\\=.\\=" buffer)) (princ "\n") (princ (profile-one-regexp "\\=a\\=" buffer)) (princ "\n") (princ (profile-one-regexp "\\." buffer)) (princ "\n") )) (defun profile-one-regexp (regexp buffer &optional nbrepeats) ;; The workhorse (setq nbrepeats (or nbrepeats 50000)) (let (start-time duration1 duration2 found) (with-current-buffer buffer (save-excursion (setq start-time (current-time)) (goto-char (point-min)) (dotimes (_ nbrepeats) (looking-at regexp)) (setq duration1 (time-subtract (current-time) start-time)) (setq start-time (current-time)) (goto-char (point-min)) (dotimes (_ nbrepeats) (when (re-search-forward regexp nil t) (setq found t) (goto-char (point-min)))) ;; return to test position (setq duration2 (time-subtract (current-time) start-time)))) (format "Testing regexp %s %d times\n\tmatch at point-min: %.4fs\n\tsearch in buffer %s (size %d): %.4fs\n%s" regexp nbrepeats (float-time duration1) (buffer-name buffer) (buffer-size buffer) (float-time duration2) (if found "\t*** WARNING *** a match was found\n" "")))) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-14 7:25 regular expressions that match nothing philippe schnoebelen @ 2019-05-14 10:14 ` Mattias Engdegård 2019-05-14 19:41 ` Stefan Monnier 0 siblings, 1 reply; 37+ messages in thread From: Mattias Engdegård @ 2019-05-14 10:14 UTC (permalink / raw) To: philippe schnoebelen, emacs-devel; +Cc: acm [-- Attachment #1: Type: text/plain, Size: 989 bytes --] tis 2019-05-14 klockan 09:25 +0200 skrev philippe schnoebelen: > I was wondering why (regexp-opt nil) uses a\` and not \'a or another > option like \=a\= so I did some profiling (see attached code). Thank you, and sorry about my bad initial attempt. I tried a few more, like [z-a], \c* and \sq, but these were no better. The distribution is decidedly bimodal; there seems to be no significant difference between the 'fast' ones, so I went with \`a\` in the attached patch. > Of course this may be dependent on the internals of the specific > regexp library at hand. I do not know enough to judge. In fact I > believe that a solid regular expression library should provide a > specific regular expression that matches nothing with special but > easy treatment that guarantees best response time. We could add a standard constant for it, like unmatchable-regexp, so that at least people don't keep reinventing it. We could also make (rx (or)) work. (It does in my complete rx rewrite.) [-- Attachment #2: 0001-Use-faster-unmatchable-regexp.patch --] [-- Type: text/x-patch, Size: 14699 bytes --] From 8c296cae6e0363485b1f24b76e7fb06984f3117e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org> Date: Tue, 14 May 2019 11:43:49 +0200 Subject: [PATCH] Use faster unmatchable regexp The unmatchable regexp a\` used in many places is unnecessarily slow in searching (unanchored) matches; replace it with the faster \`a\` as suggested by Philippe Schnoebelen. * lisp/emacs-lisp/regexp-opt.el (regexp-opt) * lisp/progmodes/cc-defs.el (cc-conditional-require-after-load) (c-make-keywords-re) * lisp/progmodes/cc-engine.el (c-beginning-of-statement-1) (c-forward-<>-arglist-recur, c-forward-decl-or-cast-1) (c-looking-at-decl-block) * lisp/progmodes/cc-fonts.el (c-doc-line-join-re) (c-doc-bright-comment-start-re) * lisp/progmodes/cc-langs.el (c-populate-syntax-table) (c-assignment-op-regexp) (c-block-comment-ender-regexp, c-font-lock-comment-end-skip) (c-block-comment-start-regexp, c-line-comment-start-regexp) (c-doc-comment-start-regexp, c-decl-start-colon-kwd-re) (c-type-decl-prefix-key, c-type-decl-operator-prefix-key) (c-pre-id-bracelist-key, c-enum-clause-introduction-re) (c-nonlabel-token-2-key) * lisp/progmodes/cc-mode.el (c-doc-fl-decl-start, c-doc-fl-decl-end) * lisp/progmodes/cc-vars.el (c-noise-macro-with-parens-name-re) (c-noise-macro-name-re, c-make-noise-macro-regexps) * lisp/progmodes/octave.el (octave-help-mode): Use faster unmatchable regexp. * lisp/textmodes/ispell.el (ispell-non-empty-string): Fix broken 'unmatchable' regexp. --- lisp/emacs-lisp/regexp-opt.el | 4 ++-- lisp/progmodes/cc-defs.el | 6 +++--- lisp/progmodes/cc-engine.el | 8 ++++---- lisp/progmodes/cc-fonts.el | 4 ++-- lisp/progmodes/cc-langs.el | 26 +++++++++++++------------- lisp/progmodes/cc-mode.el | 4 ++-- lisp/progmodes/cc-vars.el | 8 ++++---- lisp/progmodes/octave.el | 2 +- lisp/textmodes/ispell.el | 2 +- 9 files changed, 32 insertions(+), 32 deletions(-) diff --git a/lisp/emacs-lisp/regexp-opt.el b/lisp/emacs-lisp/regexp-opt.el index d883752d71..220a4bfa7a 100644 --- a/lisp/emacs-lisp/regexp-opt.el +++ b/lisp/emacs-lisp/regexp-opt.el @@ -144,9 +144,9 @@ regexp-opt (sort (copy-sequence strings) 'string-lessp))) (re (cond - ;; No strings: return a\` which cannot match anything. + ;; No strings: return \`a\` which cannot match anything. ((null strings) - (concat (or open "\\(?:") "a\\`\\)")) + (concat (or open "\\(?:") "\\`a\\`\\)")) ;; If we cannot reorder, give up all attempts at ;; optimisation. There is room for improvement (Bug#34641). ((and keep-order (regexp-opt--contains-prefix sorted-strings)) diff --git a/lisp/progmodes/cc-defs.el b/lisp/progmodes/cc-defs.el index cd4ed6b352..1ae328d4ff 100644 --- a/lisp/progmodes/cc-defs.el +++ b/lisp/progmodes/cc-defs.el @@ -81,7 +81,7 @@ (progn (require 'font-lock) (let (font-lock-keywords) - (font-lock-compile-keywords '("a\\`")) ; doesn't match anything. + (font-lock-compile-keywords '("\\`a\\`")) ; doesn't match anything. font-lock-keywords)))) \f @@ -1890,8 +1890,8 @@ c-make-keywords-re ;; Produce a regexp that doesn't match anything. (if adorn - "\\(a\\`\\)" - "a\\`"))) + "\\(\\`a\\`\\)" + "\\`a\\`"))) (put 'c-make-keywords-re 'lisp-indent-function 1) diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el index a25d059553..648e7197f3 100644 --- a/lisp/progmodes/cc-engine.el +++ b/lisp/progmodes/cc-engine.el @@ -907,7 +907,7 @@ c-beginning-of-statement-1 stack ;; Regexp which matches "for", "if", etc. (cond-key (or c-opt-block-stmt-key - "a\\`")) ; Doesn't match anything. + "\\`a\\`")) ; Doesn't match anything. ;; Return value. (ret 'same) ;; Positions of the last three sexps or bounds we've stopped at. @@ -7638,7 +7638,7 @@ c-forward-<>-arglist-recur (progn (c-forward-syntactic-ws) (when (or (and c-record-type-identifiers all-types) - (not (equal c-inside-<>-type-key "\\(a\\`\\)"))) + (not (equal c-inside-<>-type-key "\\(\\`a\\`\\)"))) (c-forward-syntactic-ws) (cond ((eq (char-after) ??) @@ -9245,7 +9245,7 @@ c-forward-decl-or-cast-1 ;; Skip over type decl prefix operators. (Note similar code in ;; `c-forward-declarator'.) (if (and c-recognize-typeless-decls - (equal c-type-decl-prefix-key "a\\`")) ; Regexp which doesn't match + (equal c-type-decl-prefix-key "\\`a\\`")) ; Regexp which doesn't match (when (eq (char-after) ?\() (progn (setq paren-depth (1+ paren-depth)) @@ -10878,7 +10878,7 @@ c-looking-at-decl-block ;; legal because it's part of a "compound keyword" like ;; "enum class". Of course, if c-after-brace-list-key ;; is nil, we can skip the test. - (or (equal c-after-brace-list-key "a\\`") ; Regexp which doesn't match + (or (equal c-after-brace-list-key "\\`a\\`") ; Regexp which doesn't match (save-match-data (save-excursion (not diff --git a/lisp/progmodes/cc-fonts.el b/lisp/progmodes/cc-fonts.el index 5f09be60a6..bed9259e42 100644 --- a/lisp/progmodes/cc-fonts.el +++ b/lisp/progmodes/cc-fonts.el @@ -2580,14 +2580,14 @@ pike-font-lock-keywords \f ;;; Doc comments. -(defvar c-doc-line-join-re "a\\`") +(defvar c-doc-line-join-re "\\`a\\`") ;; Matches a join of two lines in a doc comment. ;; This should not be changed directly, but instead set by ;; `c-setup-doc-comment-style'. This variable is used in `c-find-decl-spots' ;; in (e.g.) autodoc style comments to bridge the gap between a "@\n" at an ;; EOL and the token following "//!" on the next line. -(defvar c-doc-bright-comment-start-re "a\\`") +(defvar c-doc-bright-comment-start-re "\\`a\\`") ;; Matches the start of a "bright" comment, one whose contents may be ;; fontified by, e.g., `c-font-lock-declarations'. diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el index 8b7e4ef7c0..6fdcd6cfc7 100644 --- a/lisp/progmodes/cc-langs.el +++ b/lisp/progmodes/cc-langs.el @@ -945,7 +945,7 @@ c-populate-syntax-table (c-make-keywords-re 'appendable (c-lang-const c-cpp-include-directives)) "[ \t]*") - "a\\`")) ; Doesn't match anything + "\\`a\\`")) ; Doesn't match anything (c-lang-defvar c-cpp-include-key (c-lang-const c-cpp-include-key)) (c-lang-defconst c-opt-cpp-macro-define @@ -1331,7 +1331,7 @@ 'c-opt-op-identitier-prefix (c--set-difference (c-lang-const c-assignment-operators) '("=") :test 'string-equal))) - "a\\`")) ; Doesn't match anything. + "\\`a\\`")) ; Doesn't match anything. (c-lang-defvar c-assignment-op-regexp (c-lang-const c-assignment-op-regexp)) @@ -1554,7 +1554,7 @@ 'c-opt-op-identitier-prefix ;; language) t (if (c-lang-const c-block-comment-ender) (regexp-quote (c-lang-const c-block-comment-ender)) - "a\\`")) ; Doesn't match anything. + "\\`a\\`")) ; Doesn't match anything. (c-lang-defvar c-block-comment-ender-regexp (c-lang-const c-block-comment-ender-regexp)) @@ -1565,7 +1565,7 @@ 'c-opt-op-identitier-prefix ;; `font-lock-comment-delimiter-face'. t (if (c-lang-const c-block-comment-ender) (concat "[ \t]*" (c-lang-const c-block-comment-ender-regexp)) - "a\\`")) ; Doesn't match anything. + "\\`a\\`")) ; Doesn't match anything. (c-lang-setvar font-lock-comment-end-skip (c-lang-const c-font-lock-comment-end-skip)) @@ -1584,7 +1584,7 @@ 'c-opt-op-identitier-prefix ;; language) t (if (c-lang-const c-block-comment-starter) (regexp-quote (c-lang-const c-block-comment-starter)) - "a\\`")) ; Doesn't match anything. + "\\`a\\`")) ; Doesn't match anything. (c-lang-defvar c-block-comment-start-regexp (c-lang-const c-block-comment-start-regexp)) @@ -1593,7 +1593,7 @@ 'c-opt-op-identitier-prefix ;; language; it does in all 7 CC Mode languages). t (if (c-lang-const c-line-comment-starter) (regexp-quote (c-lang-const c-line-comment-starter)) - "a\\`")) ; Doesn't match anything. + "\\`a\\`")) ; Doesn't match anything. (c-lang-defvar c-line-comment-start-regexp (c-lang-const c-line-comment-start-regexp)) @@ -1628,7 +1628,7 @@ 'c-opt-op-identitier-prefix (c-lang-defconst c-doc-comment-start-regexp "Regexp to match the start of documentation comments." - t "a\\`" ; Doesn't match anything. + t "\\`a\\`" ; Doesn't match anything. ;; From font-lock.el: `doxygen' uses /*! while others use /**. (c c++ objc) "/\\*[*!]" java "/\\*\\*" @@ -3112,7 +3112,7 @@ 'c-opt-op-identitier-prefix "Regexp matching a keyword that is followed by a colon, where the whole construct can precede a declaration. E.g. \"public:\" in C++." - t "a\\`" ; Doesn't match anything. + t "\\`a\\`" ; Doesn't match anything. c++ (c-make-keywords-re t (c-lang-const c-protection-kwds))) (c-lang-defvar c-decl-start-colon-kwd-re (c-lang-const c-decl-start-colon-kwd-re)) @@ -3309,7 +3309,7 @@ 'c-opt-op-identitier-prefix t (if (c-lang-const c-type-modifier-kwds) (concat (regexp-opt (c-lang-const c-type-modifier-kwds) t) "\\>") ;; Default to a regexp that never matches. - "a\\`") + "\\`a\\`") ;; Check that there's no "=" afterwards to avoid matching tokens ;; like "*=". (c objc) (concat "\\(" @@ -3347,7 +3347,7 @@ 'c-opt-op-identitier-prefix as the end of the operator. Identifier syntax is in effect when this is matched \(see `c-identifier-syntax-table')." t ;; Default to a regexp that never matches. - "a\\`" + "\\`a\\`" ;; Check that there's no "=" afterwards to avoid matching tokens ;; like "*=". (c objc) (concat "\\(\\*\\)" @@ -3506,7 +3506,7 @@ 'c-opt-op-identitier-prefix (c-lang-defconst c-pre-id-bracelist-key "A regexp matching tokens which, preceding an identifier, signify a bracelist. " - t "a\\`" ; Doesn't match anything. + t "\\`a\\`" ; Doesn't match anything. c++ "new\\([^[:alnum:]_$]\\|$\\)\\|&&?\\(\\S.\\|$\\)") (c-lang-defvar c-pre-id-bracelist-key (c-lang-const c-pre-id-bracelist-key)) @@ -3562,7 +3562,7 @@ 'c-opt-op-identitier-prefix ;; before the '{' of the enum list, to avoid searching too far. "[^][{};/#=]*" "{") - "a\\`")) ; Doesn't match anything. + "\\`a\\`")) ; Doesn't match anything. (c-lang-defvar c-enum-clause-introduction-re (c-lang-const c-enum-clause-introduction-re)) @@ -3678,7 +3678,7 @@ 'c-opt-op-identitier-prefix "Regexp matching things that can't occur two symbols before a colon in a label construct. This catches C++'s inheritance construct \"class foo : bar\". Only used if `c-recognize-colon-labels' is set." - t "a\\`" ; Doesn't match anything. + t "\\`a\\`" ; Doesn't match anything. c++ (c-make-keywords-re t '("class"))) (c-lang-defvar c-nonlabel-token-2-key (c-lang-const c-nonlabel-token-2-key)) diff --git a/lisp/progmodes/cc-mode.el b/lisp/progmodes/cc-mode.el index bd62fc754a..57cfdb96b3 100644 --- a/lisp/progmodes/cc-mode.el +++ b/lisp/progmodes/cc-mode.el @@ -1825,7 +1825,7 @@ c-doc-fl-decl-start ;; by `c-doc-line-join-re'), return the position of the first line of the ;; sequence. Otherwise, return nil. Point has no significance at entry to ;; and exit from this function. - (when (not (equal c-doc-line-join-re "a\\`")) + (when (not (equal c-doc-line-join-re "\\`a\\`")) (goto-char pos) (back-to-indentation) (and (or (looking-at c-comment-start-regexp) @@ -1842,7 +1842,7 @@ c-doc-fl-decl-end ;; marker (as defined by `c-doc-line-join-re), return the position of ;; the BOL at the end of the sequence. Otherwise, return nil. Point has no ;; significance at entry to and exit from this function. - (when (not (equal c-doc-line-join-re "a\\`")) + (when (not (equal c-doc-line-join-re "\\`a\\`")) (goto-char pos) (back-to-indentation) (let ((here (point))) diff --git a/lisp/progmodes/cc-vars.el b/lisp/progmodes/cc-vars.el index 6e8acd4c0d..3b30d18872 100644 --- a/lisp/progmodes/cc-vars.el +++ b/lisp/progmodes/cc-vars.el @@ -1648,9 +1648,9 @@ c-asymmetry-fontification-flag :group 'c) ;; Initialize the next two to a regexp which never matches. -(defvar c-noise-macro-with-parens-name-re "a\\`") +(defvar c-noise-macro-with-parens-name-re "\\`a\\`") (make-variable-buffer-local 'c-noise-macro-with-parens-name-re) -(defvar c-noise-macro-name-re "a\\`") +(defvar c-noise-macro-name-re "\\`a\\`") (make-variable-buffer-local 'c-noise-macro-name-re) (defcustom c-noise-macro-names nil @@ -1682,7 +1682,7 @@ c-make-noise-macro-regexps ;; Convert `c-noise-macro-names' and `c-noise-macro-with-parens-names' into ;; `c-noise-macro-name-re' and `c-noise-macro-with-parens-name-re'. (setq c-noise-macro-with-parens-name-re - (cond ((null c-noise-macro-with-parens-names) "a\\`") ; Never matches. + (cond ((null c-noise-macro-with-parens-names) "\\`a\\`") ;Never matches. ((consp c-noise-macro-with-parens-names) (concat (regexp-opt c-noise-macro-with-parens-names t) "\\([^[:alnum:]_$]\\|$\\)")) @@ -1691,7 +1691,7 @@ c-make-noise-macro-regexps (t (error "c-make-noise-macro-regexps: \ c-noise-macro-with-parens-names is invalid: %s" c-noise-macro-with-parens-names)))) (setq c-noise-macro-name-re - (cond ((null c-noise-macro-names) "a\\`") ; Never matches anything. + (cond ((null c-noise-macro-names) "\\`a\\`") ; Never matches anything. ((consp c-noise-macro-names) (concat (regexp-opt c-noise-macro-names t) "\\([^[:alnum:]_$]\\|$\\)")) diff --git a/lisp/progmodes/octave.el b/lisp/progmodes/octave.el index 52e5fd477f..96058bca14 100644 --- a/lisp/progmodes/octave.el +++ b/lisp/progmodes/octave.el @@ -1691,7 +1691,7 @@ octave-help-mode (eval-and-compile (require 'help-mode)) ;; Don't highlight `EXAMPLE' as elisp symbols by using a regexp that ;; can never match. - (setq-local help-xref-symbol-regexp "x\\`")) + (setq-local help-xref-symbol-regexp "\\`x\\`")) (defun octave-help (fn) "Display the documentation of FN." diff --git a/lisp/textmodes/ispell.el b/lisp/textmodes/ispell.el index 6553a2799b..49698dd120 100644 --- a/lisp/textmodes/ispell.el +++ b/lisp/textmodes/ispell.el @@ -4016,7 +4016,7 @@ ispell-message (defun ispell-non-empty-string (string) (if (or (not string) (string-equal string "")) - "\\'\\`" ; An unmatchable string if string is null. + "\\`a\\`" ; An unmatchable string if string is null. (regexp-quote string))) -- 2.20.1 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-14 10:14 ` Mattias Engdegård @ 2019-05-14 19:41 ` Stefan Monnier 2019-05-15 16:21 ` Mattias Engdegård 0 siblings, 1 reply; 37+ messages in thread From: Stefan Monnier @ 2019-05-14 19:41 UTC (permalink / raw) To: emacs-devel > Thank you, and sorry about my bad initial attempt. I tried a few more, > like [z-a], \c* and \sq, but these were no better. The distribution is > decidedly bimodal; there seems to be no significant difference between > the 'fast' ones, Not surprised: the "fast" ones are the ones that the regexp engine recognizes as "anchored" so the search is reduced to a looking-at. > so I went with \`a\` in the attached patch. Sounds good. >> Of course this may be dependent on the internals of the specific >> regexp library at hand. I do not know enough to judge. In fact I >> believe that a solid regular expression library should provide a >> specific regular expression that matches nothing with special but >> easy treatment that guarantees best response time. > We could add a standard constant for it, like unmatchable-regexp, so Yes, please. I'd recommend a `regexp-` prefix for it. [ And I'll carefully avoid having an opinion on the rest of the name. ] > that at least people don't keep reinventing it. Not only that, but I'm pretty sure casual users will find it much easier to understand what's going on when they see `regexp-<foo>` than when they bump into "\\`a\\`". I.e. the constant's name should work as a good comment. Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-14 19:41 ` Stefan Monnier @ 2019-05-15 16:21 ` Mattias Engdegård 2019-05-15 19:41 ` Alan Mackenzie ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Mattias Engdegård @ 2019-05-15 16:21 UTC (permalink / raw) To: Stefan Monnier, emacs-devel [-- Attachment #1: Type: text/plain, Size: 454 bytes --] tis 2019-05-14 klockan 15:41 -0400 skrev Stefan Monnier: > Yes, please. I'd recommend a `regexp-` prefix for it. Well, since you asked so nicely! > [ And I'll carefully avoid having an opinion on the rest of the name. > ] The correct name is obviously something like `regexp-empty', but I have to concede that it might be misinterpreted. The attached patch uses `regexp-unmatchable' which is reasonably descriptive. Better suggestions are welcome. [-- Attachment #2: 0001-Add-standard-unmatchable-regexp.patch --] [-- Type: text/x-patch, Size: 20980 bytes --] From df662cd9d2cf712b436a7b3d693ebf6bd189422f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org> Date: Tue, 14 May 2019 11:43:49 +0200 Subject: [PATCH] Add standard unmatchable regexp Add `regexp-unmatchable' as a standard unmatchable regexp, defined as "\\`a\\`". Use it where such a regexp is needed, replacing slower expressions in several places. From a suggestion by Philippe Schnoebelen. * lisp/subr.el (regexp-unmatchable): New defconst. * etc/NEWS (Lisp Changes): Mention `regexp-unmatchable'. * doc/lispref/searching.texi (Regexp Functions): Document it. * lisp/emacs-lisp/regexp-opt.el (regexp-opt) * lisp/progmodes/cc-defs.el (cc-conditional-require-after-load) (c-make-keywords-re) * lisp/progmodes/cc-engine.el (c-beginning-of-statement-1) (c-forward-<>-arglist-recur, c-forward-decl-or-cast-1) (c-looking-at-decl-block) * lisp/progmodes/cc-fonts.el (c-doc-line-join-re) (c-doc-bright-comment-start-re) * lisp/progmodes/cc-langs.el (c-populate-syntax-table) (c-assignment-op-regexp) (c-block-comment-ender-regexp, c-font-lock-comment-end-skip) (c-block-comment-start-regexp, c-line-comment-start-regexp) (c-doc-comment-start-regexp, c-decl-start-colon-kwd-re) (c-type-decl-prefix-key, c-type-decl-operator-prefix-key) (c-pre-id-bracelist-key, c-enum-clause-introduction-re) (c-nonlabel-token-2-key) * lisp/progmodes/cc-mode.el (c-doc-fl-decl-start, c-doc-fl-decl-end) * lisp/progmodes/cc-vars.el (c-noise-macro-with-parens-name-re) (c-noise-macro-name-re, c-make-noise-macro-regexps) * lisp/progmodes/octave.el (octave-help-mode) * lisp/textmodes/ispell.el (ispell-non-empty-string) * lisp/vc/vc-bzr.el (vc-bzr-log-view-mode, vc-bzr-revision-completion-table) * lisp/vc/vc-git.el (vc-git-log-view-mode) * lisp/vc/vc-hg.el (vc-hg-log-view-mode) * lisp/vc/vc-mtn.el (vc-mtn-log-view-mode): Use `regexp-unmatchable'. --- doc/lispref/searching.texi | 7 +++++++ etc/NEWS | 4 ++++ lisp/emacs-lisp/regexp-opt.el | 4 ++-- lisp/net/ange-ftp.el | 2 +- lisp/progmodes/cc-defs.el | 6 +++--- lisp/progmodes/cc-engine.el | 10 ++++++---- lisp/progmodes/cc-fonts.el | 4 ++-- lisp/progmodes/cc-langs.el | 26 +++++++++++++------------- lisp/progmodes/cc-mode.el | 4 ++-- lisp/progmodes/cc-vars.el | 8 ++++---- lisp/progmodes/grep.el | 3 ++- lisp/progmodes/octave.el | 2 +- lisp/subr.el | 4 ++++ lisp/textmodes/ispell.el | 2 +- lisp/vc/vc-bzr.el | 5 +++-- lisp/vc/vc-git.el | 2 +- lisp/vc/vc-hg.el | 2 +- lisp/vc/vc-mtn.el | 2 +- 18 files changed, 58 insertions(+), 39 deletions(-) diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index 8775254dd0..24f30b4dac 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi @@ -1070,6 +1070,13 @@ Regexp Functions @c Internal functions: regexp-opt-group +@defvar regexp-unmatchable +This variable contains a regexp that is guaranteed not to match any +string at all. It is particularly useful as default value for +variables that may be set to a pattern that actually matches +something. +@end defvar + @node Regexp Search @section Regular Expression Searching @cindex regular expression searching diff --git a/etc/NEWS b/etc/NEWS index fc3ca1ea92..699a04b524 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1990,6 +1990,10 @@ returns a regexp that never matches anything, which is an identity for this operation. Previously, the empty string was returned in this case. +** New constant 'regexp-unmatchable' contains a never-matching regexp. +It is a convenient and readable way to specify a regexp that should +not match anything, and is as fast as any such regexp can be. + \f * Changes in Emacs 27.1 on Non-Free Operating Systems diff --git a/lisp/emacs-lisp/regexp-opt.el b/lisp/emacs-lisp/regexp-opt.el index d883752d71..00f72e284a 100644 --- a/lisp/emacs-lisp/regexp-opt.el +++ b/lisp/emacs-lisp/regexp-opt.el @@ -144,9 +144,9 @@ regexp-opt (sort (copy-sequence strings) 'string-lessp))) (re (cond - ;; No strings: return a\` which cannot match anything. + ;; No strings: return an unmatchable regexp. ((null strings) - (concat (or open "\\(?:") "a\\`\\)")) + (concat (or open "\\(?:") regexp-unmatchable "\\)")) ;; If we cannot reorder, give up all attempts at ;; optimisation. There is room for improvement (Bug#34641). ((and keep-order (regexp-opt--contains-prefix sorted-strings)) diff --git a/lisp/net/ange-ftp.el b/lisp/net/ange-ftp.el index 5af9ea75ed..b0a1e1799f 100644 --- a/lisp/net/ange-ftp.el +++ b/lisp/net/ange-ftp.el @@ -1989,7 +1989,7 @@ internal-ange-ftp-mode (make-local-variable 'comint-password-prompt-regexp) ;; This is a regexp that can't match anything. ;; ange-ftp has its own ways of handling passwords. - (setq comint-password-prompt-regexp "\\`a\\`") + (setq comint-password-prompt-regexp regexp-unmatchable) (make-local-variable 'paragraph-start) (setq paragraph-start comint-prompt-regexp)) diff --git a/lisp/progmodes/cc-defs.el b/lisp/progmodes/cc-defs.el index cd4ed6b352..d20e3ef32d 100644 --- a/lisp/progmodes/cc-defs.el +++ b/lisp/progmodes/cc-defs.el @@ -81,7 +81,7 @@ (progn (require 'font-lock) (let (font-lock-keywords) - (font-lock-compile-keywords '("a\\`")) ; doesn't match anything. + (font-lock-compile-keywords (list regexp-unmatchable)) font-lock-keywords)))) \f @@ -1890,8 +1890,8 @@ c-make-keywords-re ;; Produce a regexp that doesn't match anything. (if adorn - "\\(a\\`\\)" - "a\\`"))) + (concat "\\(" regexp-unmatchable "\\)") + regexp-unmatchable))) (put 'c-make-keywords-re 'lisp-indent-function 1) diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el index ed8310d0e6..41bab270da 100644 --- a/lisp/progmodes/cc-engine.el +++ b/lisp/progmodes/cc-engine.el @@ -907,7 +907,7 @@ c-beginning-of-statement-1 stack ;; Regexp which matches "for", "if", etc. (cond-key (or c-opt-block-stmt-key - "a\\`")) ; Doesn't match anything. + regexp-unmatchable)) ;; Return value. (ret 'same) ;; Positions of the last three sexps or bounds we've stopped at. @@ -7646,7 +7646,9 @@ c-forward-<>-arglist-recur (progn (c-forward-syntactic-ws) (when (or (and c-record-type-identifiers all-types) - (not (equal c-inside-<>-type-key "\\(a\\`\\)"))) + (not (equal c-inside-<>-type-key + (concat + "\\(" regexp-unmatchable "\\)")))) (c-forward-syntactic-ws) (cond ((eq (char-after) ??) @@ -9253,7 +9255,7 @@ c-forward-decl-or-cast-1 ;; Skip over type decl prefix operators. (Note similar code in ;; `c-forward-declarator'.) (if (and c-recognize-typeless-decls - (equal c-type-decl-prefix-key "a\\`")) ; Regexp which doesn't match + (equal c-type-decl-prefix-key regexp-unmatchable)) (when (eq (char-after) ?\() (progn (setq paren-depth (1+ paren-depth)) @@ -10886,7 +10888,7 @@ c-looking-at-decl-block ;; legal because it's part of a "compound keyword" like ;; "enum class". Of course, if c-after-brace-list-key ;; is nil, we can skip the test. - (or (equal c-after-brace-list-key "a\\`") ; Regexp which doesn't match + (or (equal c-after-brace-list-key regexp-unmatchable) (save-match-data (save-excursion (not diff --git a/lisp/progmodes/cc-fonts.el b/lisp/progmodes/cc-fonts.el index 5f09be60a6..b3a9dd480b 100644 --- a/lisp/progmodes/cc-fonts.el +++ b/lisp/progmodes/cc-fonts.el @@ -2580,14 +2580,14 @@ pike-font-lock-keywords \f ;;; Doc comments. -(defvar c-doc-line-join-re "a\\`") +(defvar c-doc-line-join-re regexp-unmatchable) ;; Matches a join of two lines in a doc comment. ;; This should not be changed directly, but instead set by ;; `c-setup-doc-comment-style'. This variable is used in `c-find-decl-spots' ;; in (e.g.) autodoc style comments to bridge the gap between a "@\n" at an ;; EOL and the token following "//!" on the next line. -(defvar c-doc-bright-comment-start-re "a\\`") +(defvar c-doc-bright-comment-start-re regexp-unmatchable) ;; Matches the start of a "bright" comment, one whose contents may be ;; fontified by, e.g., `c-font-lock-declarations'. diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el index 30da10a6c0..9d2f689e58 100644 --- a/lisp/progmodes/cc-langs.el +++ b/lisp/progmodes/cc-langs.el @@ -945,7 +945,7 @@ c-populate-syntax-table (c-make-keywords-re 'appendable (c-lang-const c-cpp-include-directives)) "[ \t]*") - "a\\`")) ; Doesn't match anything + regexp-unmatchable)) (c-lang-defvar c-cpp-include-key (c-lang-const c-cpp-include-key)) (c-lang-defconst c-opt-cpp-macro-define @@ -1331,7 +1331,7 @@ 'c-opt-op-identitier-prefix (c--set-difference (c-lang-const c-assignment-operators) '("=") :test 'string-equal))) - "a\\`")) ; Doesn't match anything. + regexp-unmatchable)) (c-lang-defvar c-assignment-op-regexp (c-lang-const c-assignment-op-regexp)) @@ -1554,7 +1554,7 @@ 'c-opt-op-identitier-prefix ;; language) t (if (c-lang-const c-block-comment-ender) (regexp-quote (c-lang-const c-block-comment-ender)) - "a\\`")) ; Doesn't match anything. + regexp-unmatchable)) (c-lang-defvar c-block-comment-ender-regexp (c-lang-const c-block-comment-ender-regexp)) @@ -1565,7 +1565,7 @@ 'c-opt-op-identitier-prefix ;; `font-lock-comment-delimiter-face'. t (if (c-lang-const c-block-comment-ender) (concat "[ \t]*" (c-lang-const c-block-comment-ender-regexp)) - "a\\`")) ; Doesn't match anything. + regexp-unmatchable)) (c-lang-setvar font-lock-comment-end-skip (c-lang-const c-font-lock-comment-end-skip)) @@ -1584,7 +1584,7 @@ 'c-opt-op-identitier-prefix ;; language) t (if (c-lang-const c-block-comment-starter) (regexp-quote (c-lang-const c-block-comment-starter)) - "a\\`")) ; Doesn't match anything. + regexp-unmatchable)) (c-lang-defvar c-block-comment-start-regexp (c-lang-const c-block-comment-start-regexp)) @@ -1593,7 +1593,7 @@ 'c-opt-op-identitier-prefix ;; language; it does in all 7 CC Mode languages). t (if (c-lang-const c-line-comment-starter) (regexp-quote (c-lang-const c-line-comment-starter)) - "a\\`")) ; Doesn't match anything. + regexp-unmatchable)) (c-lang-defvar c-line-comment-start-regexp (c-lang-const c-line-comment-start-regexp)) @@ -1628,7 +1628,7 @@ 'c-opt-op-identitier-prefix (c-lang-defconst c-doc-comment-start-regexp "Regexp to match the start of documentation comments." - t "a\\`" ; Doesn't match anything. + t regexp-unmatchable ;; From font-lock.el: `doxygen' uses /*! while others use /**. (c c++ objc) "/\\*[*!]" java "/\\*\\*" @@ -3112,7 +3112,7 @@ 'c-opt-op-identitier-prefix "Regexp matching a keyword that is followed by a colon, where the whole construct can precede a declaration. E.g. \"public:\" in C++." - t "a\\`" ; Doesn't match anything. + t regexp-unmatchable c++ (c-make-keywords-re t (c-lang-const c-protection-kwds))) (c-lang-defvar c-decl-start-colon-kwd-re (c-lang-const c-decl-start-colon-kwd-re)) @@ -3309,7 +3309,7 @@ 'c-opt-op-identitier-prefix t (if (c-lang-const c-type-modifier-kwds) (concat (regexp-opt (c-lang-const c-type-modifier-kwds) t) "\\>") ;; Default to a regexp that never matches. - "a\\`") + regexp-unmatchable) ;; Check that there's no "=" afterwards to avoid matching tokens ;; like "*=". (c objc) (concat "\\(" @@ -3347,7 +3347,7 @@ 'c-opt-op-identitier-prefix as the end of the operator. Identifier syntax is in effect when this is matched \(see `c-identifier-syntax-table')." t ;; Default to a regexp that never matches. - "a\\`" + regexp-unmatchable ;; Check that there's no "=" afterwards to avoid matching tokens ;; like "*=". (c objc) (concat "\\(\\*\\)" @@ -3506,7 +3506,7 @@ 'c-opt-op-identitier-prefix (c-lang-defconst c-pre-id-bracelist-key "A regexp matching tokens which, preceding an identifier, signify a bracelist. " - t "a\\`" ; Doesn't match anything. + t regexp-unmatchable c++ "new\\([^[:alnum:]_$]\\|$\\)\\|&&?\\(\\S.\\|$\\)") (c-lang-defvar c-pre-id-bracelist-key (c-lang-const c-pre-id-bracelist-key)) @@ -3562,7 +3562,7 @@ 'c-opt-op-identitier-prefix ;; before the '{' of the enum list, to avoid searching too far. "[^][{};/#=]*" "{") - "a\\`")) ; Doesn't match anything. + regexp-unmatchable)) (c-lang-defvar c-enum-clause-introduction-re (c-lang-const c-enum-clause-introduction-re)) @@ -3703,7 +3703,7 @@ 'c-opt-op-identitier-prefix "Regexp matching things that can't occur two symbols before a colon in a label construct. This catches C++'s inheritance construct \"class foo : bar\". Only used if `c-recognize-colon-labels' is set." - t "a\\`" ; Doesn't match anything. + t regexp-unmatchable c++ (c-make-keywords-re t '("class"))) (c-lang-defvar c-nonlabel-token-2-key (c-lang-const c-nonlabel-token-2-key)) diff --git a/lisp/progmodes/cc-mode.el b/lisp/progmodes/cc-mode.el index bd62fc754a..e4ff9f019d 100644 --- a/lisp/progmodes/cc-mode.el +++ b/lisp/progmodes/cc-mode.el @@ -1825,7 +1825,7 @@ c-doc-fl-decl-start ;; by `c-doc-line-join-re'), return the position of the first line of the ;; sequence. Otherwise, return nil. Point has no significance at entry to ;; and exit from this function. - (when (not (equal c-doc-line-join-re "a\\`")) + (when (not (equal c-doc-line-join-re regexp-unmatchable)) (goto-char pos) (back-to-indentation) (and (or (looking-at c-comment-start-regexp) @@ -1842,7 +1842,7 @@ c-doc-fl-decl-end ;; marker (as defined by `c-doc-line-join-re), return the position of ;; the BOL at the end of the sequence. Otherwise, return nil. Point has no ;; significance at entry to and exit from this function. - (when (not (equal c-doc-line-join-re "a\\`")) + (when (not (equal c-doc-line-join-re regexp-unmatchable)) (goto-char pos) (back-to-indentation) (let ((here (point))) diff --git a/lisp/progmodes/cc-vars.el b/lisp/progmodes/cc-vars.el index 6e8acd4c0d..b818bced73 100644 --- a/lisp/progmodes/cc-vars.el +++ b/lisp/progmodes/cc-vars.el @@ -1648,9 +1648,9 @@ c-asymmetry-fontification-flag :group 'c) ;; Initialize the next two to a regexp which never matches. -(defvar c-noise-macro-with-parens-name-re "a\\`") +(defvar c-noise-macro-with-parens-name-re regexp-unmatchable) (make-variable-buffer-local 'c-noise-macro-with-parens-name-re) -(defvar c-noise-macro-name-re "a\\`") +(defvar c-noise-macro-name-re regexp-unmatchable) (make-variable-buffer-local 'c-noise-macro-name-re) (defcustom c-noise-macro-names nil @@ -1682,7 +1682,7 @@ c-make-noise-macro-regexps ;; Convert `c-noise-macro-names' and `c-noise-macro-with-parens-names' into ;; `c-noise-macro-name-re' and `c-noise-macro-with-parens-name-re'. (setq c-noise-macro-with-parens-name-re - (cond ((null c-noise-macro-with-parens-names) "a\\`") ; Never matches. + (cond ((null c-noise-macro-with-parens-names) regexp-unmatchable) ((consp c-noise-macro-with-parens-names) (concat (regexp-opt c-noise-macro-with-parens-names t) "\\([^[:alnum:]_$]\\|$\\)")) @@ -1691,7 +1691,7 @@ c-make-noise-macro-regexps (t (error "c-make-noise-macro-regexps: \ c-noise-macro-with-parens-names is invalid: %s" c-noise-macro-with-parens-names)))) (setq c-noise-macro-name-re - (cond ((null c-noise-macro-names) "a\\`") ; Never matches anything. + (cond ((null c-noise-macro-names) regexp-unmatchable) ((consp c-noise-macro-names) (concat (regexp-opt c-noise-macro-names t) "\\([^[:alnum:]_$]\\|$\\)")) diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el index 85f9078d46..79178c4346 100644 --- a/lisp/progmodes/grep.el +++ b/lisp/progmodes/grep.el @@ -837,7 +837,8 @@ grep-mode grep-mode-line-matches) ;; compilation-directory-matcher can't be nil, so we set it to a regexp that ;; can never match. - (set (make-local-variable 'compilation-directory-matcher) '("\\`a\\`")) + (set (make-local-variable 'compilation-directory-matcher) + (list regexp-unmatchable)) (set (make-local-variable 'compilation-process-setup-function) 'grep-process-setup) (set (make-local-variable 'compilation-disable-input) t) diff --git a/lisp/progmodes/octave.el b/lisp/progmodes/octave.el index 52e5fd477f..8a7e24e5ad 100644 --- a/lisp/progmodes/octave.el +++ b/lisp/progmodes/octave.el @@ -1691,7 +1691,7 @@ octave-help-mode (eval-and-compile (require 'help-mode)) ;; Don't highlight `EXAMPLE' as elisp symbols by using a regexp that ;; can never match. - (setq-local help-xref-symbol-regexp "x\\`")) + (setq-local help-xref-symbol-regexp regexp-unmatchable)) (defun octave-help (fn) "Display the documentation of FN." diff --git a/lisp/subr.el b/lisp/subr.el index be21dc67a0..05fb9fea68 100644 --- a/lisp/subr.el +++ b/lisp/subr.el @@ -5544,4 +5544,8 @@ flatten-tree ;; for discoverability: (defalias 'flatten-list 'flatten-tree) +;; The initial anchoring is for better performance in searching matches. +(defconst regexp-unmatchable "\\`a\\`" + "Standard regexp guaranteed not to match any string at all.") + ;;; subr.el ends here diff --git a/lisp/textmodes/ispell.el b/lisp/textmodes/ispell.el index 6553a2799b..0c5e6a183b 100644 --- a/lisp/textmodes/ispell.el +++ b/lisp/textmodes/ispell.el @@ -4016,7 +4016,7 @@ ispell-message (defun ispell-non-empty-string (string) (if (or (not string) (string-equal string "")) - "\\'\\`" ; An unmatchable string if string is null. + regexp-unmatchable (regexp-quote string))) diff --git a/lisp/vc/vc-bzr.el b/lisp/vc/vc-bzr.el index ab5a449cd3..ee1646cae5 100644 --- a/lisp/vc/vc-bzr.el +++ b/lisp/vc/vc-bzr.el @@ -702,7 +702,7 @@ vc-bzr-log-view-mode (remove-hook 'log-view-mode-hook 'vc-bzr-log-view-mode) ;Deactivate the hack. (require 'add-log) (set (make-local-variable 'log-view-per-file-logs) nil) - (set (make-local-variable 'log-view-file-re) "\\`a\\`") + (set (make-local-variable 'log-view-file-re) regexp-unmatchable) (set (make-local-variable 'log-view-message-re) (if (eq vc-log-view-type 'short) "^ *\\([0-9.]+\\): \\(.*?\\)[ \t]+\\([0-9]\\{4\\}-[0-9]\\{2\\}-[0-9]\\{2\\}\\)\\( \\[merge\\]\\)?" @@ -1319,7 +1319,8 @@ vc-bzr-revision-completion-table ((string-match "\\`annotate:" string) (completion-table-with-context (substring string 0 (match-end 0)) - (apply-partially #'completion-table-with-terminator '(":" . "\\`a\\`") + (apply-partially #'completion-table-with-terminator + (cons ":" regexp-unmatchable) #'completion-file-name-table) (substring string (match-end 0)) pred action)) diff --git a/lisp/vc/vc-git.el b/lisp/vc/vc-git.el index 192e6cf68f..61c13026cc 100644 --- a/lisp/vc/vc-git.el +++ b/lisp/vc/vc-git.el @@ -1087,7 +1087,7 @@ log-view-expanded-log-entry-function (define-derived-mode vc-git-log-view-mode log-view-mode "Git-Log-View" (require 'add-log) ;; We need the faces add-log. ;; Don't have file markers, so use impossible regexp. - (set (make-local-variable 'log-view-file-re) "\\`a\\`") + (set (make-local-variable 'log-view-file-re) regexp-unmatchable) (set (make-local-variable 'log-view-per-file-logs) nil) (set (make-local-variable 'log-view-message-re) (if (not (eq vc-log-view-type 'long)) diff --git a/lisp/vc/vc-hg.el b/lisp/vc/vc-hg.el index d3f132dae7..876d824cea 100644 --- a/lisp/vc/vc-hg.el +++ b/lisp/vc/vc-hg.el @@ -444,7 +444,7 @@ log-view-expanded-log-entry-function (define-derived-mode vc-hg-log-view-mode log-view-mode "Hg-Log-View" (require 'add-log) ;; we need the add-log faces - (set (make-local-variable 'log-view-file-re) "\\`a\\`") + (set (make-local-variable 'log-view-file-re) regexp-unmatchable) (set (make-local-variable 'log-view-per-file-logs) nil) (set (make-local-variable 'log-view-message-re) (if (eq vc-log-view-type 'short) diff --git a/lisp/vc/vc-mtn.el b/lisp/vc/vc-mtn.el index f0b12489c1..91cc28021c 100644 --- a/lisp/vc/vc-mtn.el +++ b/lisp/vc/vc-mtn.el @@ -240,7 +240,7 @@ log-view-per-file-logs (define-derived-mode vc-mtn-log-view-mode log-view-mode "Mtn-Log-View" ;; Don't match anything. - (set (make-local-variable 'log-view-file-re) "\\`a\\`") + (set (make-local-variable 'log-view-file-re) regexp-unmatchable) (set (make-local-variable 'log-view-per-file-logs) nil) ;; TODO: Use a more precise regexp than "[ |/]+" to avoid false positives ;; in the ChangeLog text. -- 2.20.1 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-15 16:21 ` Mattias Engdegård @ 2019-05-15 19:41 ` Alan Mackenzie 2019-05-16 10:54 ` Mattias Engdegård 2019-05-15 20:17 ` Michael Heerdegen 2019-05-16 18:12 ` regular expressions that match nothing Eric Abrahamsen 2 siblings, 1 reply; 37+ messages in thread From: Alan Mackenzie @ 2019-05-15 19:41 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Stefan Monnier, emacs-devel Hello, Mattias. On Wed, May 15, 2019 at 18:21:07 +0200, Mattias Engdegård wrote: > tis 2019-05-14 klockan 15:41 -0400 skrev Stefan Monnier: > > Yes, please. I'd recommend a `regexp-` prefix for it. > Well, since you asked so nicely! > > [ And I'll carefully avoid having an opinion on the rest of the name. > > ] > The correct name is obviously something like `regexp-empty', but I have > to concede that it might be misinterpreted. The attached patch uses > `regexp-unmatchable' which is reasonably descriptive. Better > suggestions are welcome. I think regexp-unmatchable is too much of a mouthful. It is more difficult to type that a\\` (or whatever), even after having to think where the seldom used keys are on the keyboard. Also it is difficult to spell. is it unmatchable or unmatcheable? I would suggest re-nomatch (or possibly nomatch-re), which is just 10 characters, as opposed to your suggestion which is 18. Quite possibly, re-nomatch is easier to type than a\\` (or whatever). This ease of typing is important, because it encourages hackers to use it rather than typing out the shorter thing. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-15 19:41 ` Alan Mackenzie @ 2019-05-16 10:54 ` Mattias Engdegård 2019-05-16 23:18 ` Phil Sainty 0 siblings, 1 reply; 37+ messages in thread From: Mattias Engdegård @ 2019-05-16 10:54 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Stefan Monnier, emacs-devel 15 maj 2019 kl. 21.41 skrev Alan Mackenzie <acm@muc.de>: > > I think regexp-unmatchable is too much of a mouthful. It is more > difficult to type that a\\` (or whatever), even after having to think > where the seldom used keys are on the keyboard. Also it is difficult to > spell. is it unmatchable or unmatcheable? > > I would suggest re-nomatch (or possibly nomatch-re), which is just 10 > characters, as opposed to your suggestion which is 18. Quite possibly, > re-nomatch is easier to type than a\\` (or whatever). Thanks Alan, and you may have a point. I'm definitely not against a better name if there is a consensus for it. Let me just dispassionately note that: 1. As 'match' does not end in 'e', there is no more reason to write 'unmatcheable' than 'undrinkeable'. 2. (rx (or)) is even shorter than re-nomatch, and is very memorable. (rx (|)) is shorter still. 3. Lisp tradition is unafraid of the verbose, partly because `-' is allowed in identifiers which lowers the friction. 4. The point of this name isn't to be shorter than the regexp string it represents, but to be more readable and avoid mistakes and substandard reinventions. Start bikeshedding; I'll try to low-pass filter. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-16 10:54 ` Mattias Engdegård @ 2019-05-16 23:18 ` Phil Sainty 2019-05-17 9:43 ` Alan Mackenzie 0 siblings, 1 reply; 37+ messages in thread From: Phil Sainty @ 2019-05-16 23:18 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Alan Mackenzie, Stefan Monnier, emacs-devel > 15 maj 2019 kl. 21.41 skrev Alan Mackenzie <acm@muc.de>: >> I think regexp-unmatchable is too much of a mouthful. I like it, myself. I think the meaning is 100% clear and unambiguous for the reader (which I can't say about the alternative suggestions that I've seen). Are we expecting this to be used so much that we're prioritising brevity over clarity? (That's a genuine question -- I have a similar definition in my own config, and I have exactly one use for it.) On 2019-05-16 22:54, Mattias Engdegård wrote: > 4. The point of this name isn't to be shorter than the regexp string > it represents, but to be more readable and avoid mistakes and > substandard reinventions. Quite. > 2. (rx (or)) is even shorter than re-nomatch, and is very memorable. > (rx (|)) is shorter still. I don't think those are much better than people using "a\\`". *Surely* `rx` can simply acquire a symbol for this? (rx unmatchable) or similar? -Phil ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-16 23:18 ` Phil Sainty @ 2019-05-17 9:43 ` Alan Mackenzie 2019-05-17 10:17 ` Mattias Engdegård 2019-05-17 12:53 ` Stefan Monnier 0 siblings, 2 replies; 37+ messages in thread From: Alan Mackenzie @ 2019-05-17 9:43 UTC (permalink / raw) To: Phil Sainty; +Cc: Mattias Engdegård, Stefan Monnier, emacs-devel Hello, Phil. On Fri, May 17, 2019 at 11:18:49 +1200, Phil Sainty wrote: > > 15 maj 2019 kl. 21.41 skrev Alan Mackenzie <acm@muc.de>: > >> I think regexp-unmatchable is too much of a mouthful. > I like it, myself. I think the meaning is 100% clear and unambiguous > for the reader (which I can't say about the alternative suggestions > that I've seen). I find it too difficult to read. My brain simply doesn't recognise it instantly, the way it would re-nomatch. At the moment, given there is no similar symbol name in Emacs, this is less urgent, but if more similar long symbols were introduced this would be a pain - a minor pain yes, but a pain nevertheless. > Are we expecting this to be used so much that we're prioritising > brevity over clarity? Brevity is clarity - up to a point. We write `defun', not `define-function'. Who would argue that the latter of these is clearer? Why has nobody commented on my suggestion of using re- rather than regexp- as the prefix? We already have re-search-forward. > (That's a genuine question -- I have a similar definition in my own > config, and I have exactly one use for it.) There are quite a few uses of "a\\`" in CC Mode. If they were to be replaced by regexp-unmatchable, I might have to re-flow the code, to avoid it going too far over 80 columns. > On 2019-05-16 22:54, Mattias Engdegård wrote: > > 4. The point of this name isn't to be shorter than the regexp string > > it represents, but to be more readable and avoid mistakes and > > substandard reinventions. > Quite. > > 2. (rx (or)) is even shorter than re-nomatch, and is very memorable. > > (rx (|)) is shorter still. This is undesirable in source files which don't otherwise use rx. It's also cryptic, forcing some readers to do research. > I don't think those are much better than people using "a\\`". > *Surely* `rx` can simply acquire a symbol for this? > (rx unmatchable) or similar? > -Phil -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-17 9:43 ` Alan Mackenzie @ 2019-05-17 10:17 ` Mattias Engdegård 2019-05-17 12:53 ` Stefan Monnier 1 sibling, 0 replies; 37+ messages in thread From: Mattias Engdegård @ 2019-05-17 10:17 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Phil Sainty, Stefan Monnier, emacs-devel 17 maj 2019 kl. 11.43 skrev Alan Mackenzie <acm@muc.de>: > > There are quite a few uses of "a\\`" in CC Mode. If they were to be > replaced by regexp-unmatchable, I might have to re-flow the code, to > avoid it going too far over 80 columns. Already did that (sorry), without much re-flowing needed. Many lines actually became shorter. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-17 9:43 ` Alan Mackenzie 2019-05-17 10:17 ` Mattias Engdegård @ 2019-05-17 12:53 ` Stefan Monnier 1 sibling, 0 replies; 37+ messages in thread From: Stefan Monnier @ 2019-05-17 12:53 UTC (permalink / raw) To: emacs-devel > similar long symbols were introduced this would be a pain - a minor pain > yes, but a pain nevertheless. While it's long, it's shorter than the cryptic regexp itself if you take into account the comment that had to accompany the regexp to explain it and that now is made unnecessary because the name is self-explanatory. > Why has nobody commented on my suggestion of using re- rather than > regexp- as the prefix? We already have re-search-forward. I recommended "regexp-". I like brevity and plead guilty of preferring "-re" (over "-regexp") suffixes in variable names. Yet I recommended "regexp-" because it seems to be what Elisp is standardizing on (e.g. compare `C-h o re- TAB` and `C-h o regexp- TAB`; or look at the number of variables that were renamed from "-re" to "-regexp") and I think uniformity and understandability here trumps saving a few chars. Especially because this is not used very often at all. Of course, feel free to define `c-nomatch` ;-) My own favorite is <prefix>-∅ which I believe is the shortest of all the suggestions I've seen pass by; yet I don't think we want to go that way. Clearly, this is prime bikeshedding material, so I'll just support the code author's choice because he was the one who finally did it, after all these years where it's been mentioned as something we could/should do. Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-15 16:21 ` Mattias Engdegård 2019-05-15 19:41 ` Alan Mackenzie @ 2019-05-15 20:17 ` Michael Heerdegen 2019-05-15 21:06 ` Stefan Monnier 2019-05-15 21:07 ` Mattias Engdegård 2019-05-16 18:12 ` regular expressions that match nothing Eric Abrahamsen 2 siblings, 2 replies; 37+ messages in thread From: Michael Heerdegen @ 2019-05-15 20:17 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Stefan Monnier, emacs-devel Mattias Engdegård <mattiase@acm.org> writes: > The correct name is obviously something like `regexp-empty', but I have > to concede that it might be misinterpreted. The attached patch uses > `regexp-unmatchable' which is reasonably descriptive. Better > suggestions are welcome. Should there be an rx regexp form for this? Michael. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-15 20:17 ` Michael Heerdegen @ 2019-05-15 21:06 ` Stefan Monnier 2019-05-15 21:07 ` Mattias Engdegård 1 sibling, 0 replies; 37+ messages in thread From: Stefan Monnier @ 2019-05-15 21:06 UTC (permalink / raw) To: Michael Heerdegen; +Cc: Mattias Engdegård, emacs-devel > Should there be an rx regexp form for this? (or) Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-15 20:17 ` Michael Heerdegen 2019-05-15 21:06 ` Stefan Monnier @ 2019-05-15 21:07 ` Mattias Engdegård 2019-05-15 21:38 ` Michael Heerdegen ` (2 more replies) 1 sibling, 3 replies; 37+ messages in thread From: Mattias Engdegård @ 2019-05-15 21:07 UTC (permalink / raw) To: Michael Heerdegen; +Cc: Stefan Monnier, emacs-devel [-- Attachment #1: Type: text/plain, Size: 302 bytes --] 15 maj 2019 kl. 22.17 skrev Michael Heerdegen <michael_heerdegen@web.de>: > > Should there be an rx regexp form for this? We don't necessarily need a special form for it; we can just make `(or)' work. Proposed patch attached. (I also added its dual, (seq), since it would be silly not to.) [-- Attachment #2: 0001-Allow-zero-argument-rx-or-and-seq-forms.patch --] [-- Type: application/octet-stream, Size: 3884 bytes --] From b7706f5b398bb360ac1405842efe852ca89b9de8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org> Date: Wed, 15 May 2019 22:44:00 +0200 Subject: [PATCH] Allow zero-argument rx `or' and `seq' forms Make the rx `or' and `seq' forms accept zero arguments to produce a never-matching regexp and an empty string, respectively. * lisp/emacs-lisp/rx.el (rx-constituents, rx-or): Permit zero args. (rx): Amend doc string for `or' and `seq'. * test/lisp/emacs-lisp/rx-tests.el (rx-or, rx-seq): Test the change. * etc/NEWS (Changes in Specialized Modes and Packages): Mention the change. --- etc/NEWS | 5 +++++ lisp/emacs-lisp/rx.el | 13 ++++++++----- test/lisp/emacs-lisp/rx-tests.el | 8 +++++++- 3 files changed, 20 insertions(+), 6 deletions(-) diff --git a/etc/NEWS b/etc/NEWS index 699a04b524..5f3468596b 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1310,6 +1310,11 @@ when given in a string. Previously, '(any "\x80-\xff")' would match characters U+0080...U+00FF. Now the expression matches raw bytes in the 128...255 range, as expected. +*** The rx 'or' and 'seq' forms no longer require any arguments. +The zero-argument forms (or) and (seq) are now permitted: (or) +produces a regexp that never matches anything, while (seq) produces +the empty string, each being an identity for its operation. + ** Frames +++ diff --git a/lisp/emacs-lisp/rx.el b/lisp/emacs-lisp/rx.el index fdd24317c6..5437927b9e 100644 --- a/lisp/emacs-lisp/rx.el +++ b/lisp/emacs-lisp/rx.el @@ -111,11 +111,11 @@ ;; FIXME: support macros. (defvar rx-constituents ;Not `const' because some modes extend it. - '((and . (rx-and 1 nil)) + '((and . (rx-and 0 nil)) (seq . and) ; SRE (: . and) ; SRE (sequence . and) ; sregex - (or . (rx-or 1 nil)) + (or . (rx-or 0 nil)) (| . or) ; SRE (not-newline . ".") (nonl . not-newline) ; SRE @@ -391,9 +391,11 @@ FORM is of the form `(and FORM1 ...)'." "Parse and produce code from FORM, which is `(or FORM1 ...)'." (rx-check form) (rx-group-if - (if (memq nil (mapcar 'stringp (cdr form))) - (mapconcat (lambda (x) (rx-form x '|)) (cdr form) "\\|") + (cond + ((null (cdr form)) regexp-unmatchable) + ((cl-every #'stringp (cdr form)) (regexp-opt (cdr form) nil t)) + (t (mapconcat (lambda (x) (rx-form x '|)) (cdr form) "\\|"))) (and (memq rx-parent '(: * t)) rx-parent))) @@ -1122,6 +1124,7 @@ CHAR `(seq SEXP1 SEXP2 ...)' `(sequence SEXP1 SEXP2 ...)' matches what SEXP1 matches, followed by what SEXP2 matches, etc. + Without arguments, matches the empty string. `(submatch SEXP1 SEXP2 ...)' `(group SEXP1 SEXP2 ...)' @@ -1137,7 +1140,7 @@ CHAR `(| SEXP1 SEXP2 ...)' matches anything that matches SEXP1 or SEXP2, etc. If all args are strings, use `regexp-opt' to optimize the resulting - regular expression. + regular expression. Without arguments, never matches anything. `(minimal-match SEXP)' produce a non-greedy regexp for SEXP. Normally, regexps matching diff --git a/test/lisp/emacs-lisp/rx-tests.el b/test/lisp/emacs-lisp/rx-tests.el index 4a5919edf0..6f392d616d 100644 --- a/test/lisp/emacs-lisp/rx-tests.el +++ b/test/lisp/emacs-lisp/rx-tests.el @@ -107,7 +107,13 @@ "ab")) (should (equal (and (string-match (rx (or "a" "ab" "abc")) s) (match-string 0 s)) - "a")))) + "a"))) + ;; Test zero-argument `or'. + (should (equal (rx (or)) regexp-unmatchable))) + +(ert-deftest rx-seq () + ;; Test zero-argument `seq'. + (should (equal (rx (seq)) ""))) (provide 'rx-tests) ;; rx-tests.el ends here. -- 2.20.1 (Apple Git-117) ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-15 21:07 ` Mattias Engdegård @ 2019-05-15 21:38 ` Michael Heerdegen 2019-05-16 6:57 ` More re odditie [Was: regular expressions that match nothing] phs 2019-05-20 16:26 ` Bootstrap/autoload policy (was Re: regular expressions that match nothing) Mattias Engdegård 2 siblings, 0 replies; 37+ messages in thread From: Michael Heerdegen @ 2019-05-15 21:38 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Stefan Monnier, emacs-devel Mattias Engdegård <mattiase@acm.org> writes: > 15 maj 2019 kl. 22.17 skrev Michael Heerdegen <michael_heerdegen@web.de>: > > > > Should there be an rx regexp form for this? > > We don't necessarily need a special form for it; we can just make > `(or)' work. > > Proposed patch attached. (I also added its dual, (seq), since it would > be silly not to.) Makes sense to me, thanks. Michael. ^ permalink raw reply [flat|nested] 37+ messages in thread
* More re odditie [Was: regular expressions that match nothing] 2019-05-15 21:07 ` Mattias Engdegård 2019-05-15 21:38 ` Michael Heerdegen @ 2019-05-16 6:57 ` phs 2019-05-16 9:29 ` Mattias Engdegård 2019-05-16 18:35 ` Michael Heerdegen 2019-05-20 16:26 ` Bootstrap/autoload policy (was Re: regular expressions that match nothing) Mattias Engdegård 2 siblings, 2 replies; 37+ messages in thread From: phs @ 2019-05-16 6:57 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Michael Heerdegen, Stefan Monnier, emacs-devel While testing how rx handles the new regexp-unmatchable construct, I stumbled on some odd emacs behavior. The regular expression "*", instead of matching zero-or-more occurrences of the empty string, behaves as "\*" or "[*]" and only matches the * character. I believe this is a bug, not a feature, and in any case the Emacs manual does not document this behavior. Perhaps it is not a bug in Emacs per se, rather a bug in the gnulib regex library so I'll wait for comments before submitting a bug report... If this behavior is the expected one, then (rx (0+ "")) should not return "*". We have the same problem with "+" and (rx (1+ "")). -phs On 2019/05/15 23:07, Mattias Engdegård wrote: > 15 maj 2019 kl. 22.17 skrev Michael Heerdegen <michael_heerdegen@web.de>: >> Should there be an rx regexp form for this? > We don't necessarily need a special form for it; we can just make `(or)' work. > > Proposed patch attached. (I also added its dual, (seq), since it would be silly not to.) > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: More re odditie [Was: regular expressions that match nothing] 2019-05-16 6:57 ` More re odditie [Was: regular expressions that match nothing] phs @ 2019-05-16 9:29 ` Mattias Engdegård 2019-05-16 10:59 ` phs 2019-05-16 18:35 ` Michael Heerdegen 1 sibling, 1 reply; 37+ messages in thread From: Mattias Engdegård @ 2019-05-16 9:29 UTC (permalink / raw) To: phs; +Cc: Michael Heerdegen, Stefan Monnier, emacs-devel 16 maj 2019 kl. 08.57 skrev phs <phs@lsv.fr>: > > The regular expression "*", instead of matching zero-or-more occurrences > of the empty string, behaves as "\*" or "[*]" and only matches the * > character. > > I believe this is a bug, not a feature, and in any case the Emacs manual > does not document this behavior. Actually this one is documented: For historical compatibility, special characters are treated as ordinary ones if they are in contexts where their special meanings make no sense. For example, `*foo' treats `*' as ordinary since there is no preceding expression on which the `*' can act. and in any case the 'correct' behaviour would be to signal a syntax error, not repeat the empty string. > If this behavior is the expected one, then (rx (0+ "")) should not > return "*". Thanks for reporting it, and you are right, that's a (known) bug in rx. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: More re odditie [Was: regular expressions that match nothing] 2019-05-16 9:29 ` Mattias Engdegård @ 2019-05-16 10:59 ` phs 2019-05-16 12:31 ` Stefan Monnier 0 siblings, 1 reply; 37+ messages in thread From: phs @ 2019-05-16 10:59 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Michael Heerdegen, Stefan Monnier, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1205 bytes --] Hi Mattias, On 2019/05/16 11:29, Mattias Engdegård wrote: > 16 maj 2019 kl. 08.57 skrev phs <phs@lsv.fr>: > Actually this one is documented: > > For historical compatibility, special characters are treated as ordinary > ones if they are in contexts where their special meanings make no sense. > For example, `*foo' treats `*' as ordinary since there is no preceding > expression on which the `*' can act. I missed that. Thanks for pointing it out. > and in any case the 'correct' behaviour would be to signal a syntax error, not repeat the empty string. I'd rather read `*' as meaning "repeat the empty string", as with `\(\)*', but this is a matter of taste, and historical compatibility is very important. BTW, can your scans of regexps tell if this compatibility is relied on a lot? It would be safe to replace `*' and `+' with `\*' and `\+' where this happens. I've just grep'ed quickly through the code and only noticed a risky use of "+" (and "[..]") in the definition of `term-word' in term.el > Thanks for reporting it, and you are right, that's a (known) bug in rx. When rx is fixed, I suggest we add the following extra tests (see patch) --phs [-- Attachment #2: more-rx-tests.patch --] [-- Type: text/plain, Size: 875 bytes --] diff --git a/test/lisp/emacs-lisp/rx-tests.el b/test/lisp/emacs-lisp/rx-tests.el index 4a5919edf0..373bc9b2e3 100644 --- a/test/lisp/emacs-lisp/rx-tests.el +++ b/test/lisp/emacs-lisp/rx-tests.el @@ -109,5 +109,16 @@ (match-string 0 s)) "a")))) +(ert-deftest rx-kleene-empty-string () + ;; Test whether the repetition of empty string --that matches everywhere-- is handled correctly + (should (string-match (rx (zero-or-more "")) "foo")) + (should (string-match (rx (one-or-more "")) "foo"))) + +(ert-deftest rx-kleene-unmatchable () + ;; Test whether the repetition of regexp-unmatchable is handled correctly + (should (string-match (rx (zero-or-more (or))) "foo")) + (should-not (string-match (rx (or)) "foo")) + (should-not (string-match (rx (one-or-more (or))) "+"))) + (provide 'rx-tests) ;; rx-tests.el ends here. ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: More re odditie [Was: regular expressions that match nothing] 2019-05-16 10:59 ` phs @ 2019-05-16 12:31 ` Stefan Monnier 0 siblings, 0 replies; 37+ messages in thread From: Stefan Monnier @ 2019-05-16 12:31 UTC (permalink / raw) To: phs; +Cc: Michael Heerdegen, Mattias Engdegård, emacs-devel > I'd rather read `*' as meaning "repeat the empty string", `*` repeats the immediately preceding regular expression, but here there is no preceding regular expression: - there are various ways to resolve this problem. - all "reasonable" semantics are pretty useless in the sense that it's trivial to write another regexp with the same semantics. - the combination of the previous two points implies that signaling an error is probably the better option. > BTW, can your scans of regexps tell if this compatibility is relied on a > lot? His scan does catch those (and many other of its friends). > It would be safe to replace `*' and `+' with `\*' and `\+' where > this happens. We've done that, indeed. > I've just grep'ed quickly through the code and only noticed a risky > use of "+" (and "[..]") in the definition of `term-word' in term.el That's because we already fixed the occurrences that his tool finds ;-) >> Thanks for reporting it, and you are right, that's a (known) bug in rx. > When rx is fixed, I suggest we add the following extra tests (see patch) Good idea. Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: More re odditie [Was: regular expressions that match nothing] 2019-05-16 6:57 ` More re odditie [Was: regular expressions that match nothing] phs 2019-05-16 9:29 ` Mattias Engdegård @ 2019-05-16 18:35 ` Michael Heerdegen 2019-05-16 20:31 ` Mattias Engdegård 1 sibling, 1 reply; 37+ messages in thread From: Michael Heerdegen @ 2019-05-16 18:35 UTC (permalink / raw) To: Mattias Engdegård; +Cc: phs, Stefan Monnier, emacs-devel BTW, while we are here: I'm also curious whether we want to add nl, newline -> "\n" to rx. I mean we have nonl and space - only for newlines one needs to use string syntax in rx. Michael. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: More re odditie [Was: regular expressions that match nothing] 2019-05-16 18:35 ` Michael Heerdegen @ 2019-05-16 20:31 ` Mattias Engdegård 2019-05-16 21:01 ` Global and local definitions of non-functions/variable (was: More re odditie [Was: regular expressions that match nothing]) Stefan Monnier 0 siblings, 1 reply; 37+ messages in thread From: Mattias Engdegård @ 2019-05-16 20:31 UTC (permalink / raw) To: Michael Heerdegen; +Cc: phs, Stefan Monnier, emacs-devel 16 maj 2019 kl. 20.35 skrev Michael Heerdegen <michael_heerdegen@web.de>: > > BTW, while we are here: I'm also curious whether we want to add > nl, newline -> "\n" to rx. I mean we have nonl and space - only for > newlines one needs to use string syntax in rx. Maybe, but there is no special name for any other literal string. `space' doesn't mean the space character but [[:space:]], or \s-, whose exact meaning depends on the current syntax table. (Which is sometimes very surprising; look at rfc2047-syntax-table!) `nonl' has plenty of legacy -- the name is from SRE, and even in string regexps it has a special convenience symbol. I would much rather see a clean, robust and expressive extension mechanism for rx (`rx-constituents' does not count). There are various hacks and libraries but it probably needs to be integrated. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Global and local definitions of non-functions/variable (was: More re odditie [Was: regular expressions that match nothing]) 2019-05-16 20:31 ` Mattias Engdegård @ 2019-05-16 21:01 ` Stefan Monnier 0 siblings, 0 replies; 37+ messages in thread From: Stefan Monnier @ 2019-05-16 21:01 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Michael Heerdegen, phs, emacs-devel > I would much rather see a clean, robust and expressive extension mechanism > for rx (`rx-constituents' does not count). There are various hacks and > libraries but it probably needs to be integrated. Agreed. Something like `rx-defmacro` and `rx-macrolet`. Reminds me that this is a recurring need (we see it in `pcase`, in `peg`, here, and arguably in `gv`). It would be nice if we could design a general solution that those packages can (re)use. Stefan PS: Arguably `cl-defmethod` could also be extended to a kind of `methodlet` for scoped methods, but it might be tricky to do that. OTOH, this might provide exactly the generic mechanism we need to implement the others. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-15 21:07 ` Mattias Engdegård 2019-05-15 21:38 ` Michael Heerdegen 2019-05-16 6:57 ` More re odditie [Was: regular expressions that match nothing] phs @ 2019-05-20 16:26 ` Mattias Engdegård 2019-05-22 14:02 ` Stefan Monnier 2 siblings, 1 reply; 37+ messages in thread From: Mattias Engdegård @ 2019-05-20 16:26 UTC (permalink / raw) To: Emacs developers; +Cc: Stefan Monnier 15 maj 2019 kl. 23.07 skrev Mattias Engdegård <mattiase@acm.org>: > > We don't necessarily need a special form for it; we can just make `(or)' work. > > Proposed patch attached. (I also added its dual, (seq), since it would be silly not to.) Naïvely thinking the patch would be uncontroversial, I pushed it to master only to see a bootstrap failure and I could use some policy advice. The patch uses `cl-every' in rx.el, and the surface error (now fixed) was that rx didn't require `cl-extra' where that function is defined, despite it being autoloaded. (rx is pulled in from compile.el, from bytecomp.el, from cl-generic.el, from loadup.el.) What I would like to know is whether the bootstrap was trying to tell me something here. Was I wrong in using `cl-every' in the first place? How do I know when to trust autoload, and when to use an explicit require? Bootstrap after every change and see if it breaks? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-20 16:26 ` Bootstrap/autoload policy (was Re: regular expressions that match nothing) Mattias Engdegård @ 2019-05-22 14:02 ` Stefan Monnier 2019-05-22 14:07 ` Mattias Engdegård 0 siblings, 1 reply; 37+ messages in thread From: Stefan Monnier @ 2019-05-22 14:02 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Emacs developers > The patch uses `cl-every' in rx.el, and the surface error (now fixed) was > that rx didn't require `cl-extra' where that function is defined, despite it > being autoloaded. cl-every is not autoloaded (in general). It's only autoloaded after you (require 'cl-lib), which is The Right Way to get it defined (as opposed to requiring cl-extra which is wrong because where cl-every is defined is an internal implementation detail of cl-lib). Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-22 14:02 ` Stefan Monnier @ 2019-05-22 14:07 ` Mattias Engdegård 2019-05-22 14:24 ` Stefan Monnier 0 siblings, 1 reply; 37+ messages in thread From: Mattias Engdegård @ 2019-05-22 14:07 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs developers 22 maj 2019 kl. 16.02 skrev Stefan Monnier <monnier@iro.umontreal.ca>: > > cl-every is not autoloaded (in general). It's only autoloaded after you > (require 'cl-lib), which is The Right Way to get it defined (as opposed > to requiring cl-extra which is wrong because where cl-every is defined > is an internal implementation detail of cl-lib). I thought so too, but rx.el already did require cl-lib, and that was apparently not enough. Hence my confusion. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-22 14:07 ` Mattias Engdegård @ 2019-05-22 14:24 ` Stefan Monnier 2019-05-22 15:06 ` Mattias Engdegård 0 siblings, 1 reply; 37+ messages in thread From: Stefan Monnier @ 2019-05-22 14:24 UTC (permalink / raw) To: emacs-devel >> cl-every is not autoloaded (in general). It's only autoloaded after you >> (require 'cl-lib), which is The Right Way to get it defined (as opposed >> to requiring cl-extra which is wrong because where cl-every is defined >> is an internal implementation detail of cl-lib). > I thought so too, but rx.el already did require cl-lib, and that was > apparently not enough. Hence my confusion. Hmm... the bootstrap is a funny beast and we additionally impose extra constraints to try and catch some bugs so... without seeing the error's backtrace I don't think I can say much more at this point. Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-22 14:24 ` Stefan Monnier @ 2019-05-22 15:06 ` Mattias Engdegård 2019-05-22 15:53 ` Stefan Monnier 0 siblings, 1 reply; 37+ messages in thread From: Mattias Engdegård @ 2019-05-22 15:06 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel 22 maj 2019 kl. 16.24 skrev Stefan Monnier <monnier@iro.umontreal.ca>: > > Hmm... the bootstrap is a funny beast and we additionally impose extra > constraints to try and catch some bugs so... without seeing the error's > backtrace I don't think I can say much more at this point. There was no backtrace, but you should be able to reproduce by removing the (require 'cl-extra) from rx.el and bootstrap. I traced it manually thus (in call order): loadup.el: (load "emacs-lisp/cl-generic") cl-generic.el: toplevel: (cl--generic-prefill-dispatchers 0 (head eql)) cl-generic.el: cl--generic-prefill-dispatchers: call cl--generic-get-dispatcher cl-generic.el: cl--generic-get-dispatcher: call byte-compile bytecomp.el: (require 'compile) compile.el: (defvar compilation-error-regexp-alist-alist ... (rx...)) rx.el: call cl-every from call to rx-to-string Eager macro-expansion failure: (void-function cl-every) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-22 15:06 ` Mattias Engdegård @ 2019-05-22 15:53 ` Stefan Monnier 2019-05-22 16:40 ` Mattias Engdegård 0 siblings, 1 reply; 37+ messages in thread From: Stefan Monnier @ 2019-05-22 15:53 UTC (permalink / raw) To: emacs-devel Mattias Engdegård <mattiase@acm.org> writes: > 22 maj 2019 kl. 16.24 skrev Stefan Monnier <monnier@iro.umontreal.ca>: >> >> Hmm... the bootstrap is a funny beast and we additionally impose extra >> constraints to try and catch some bugs so... without seeing the error's >> backtrace I don't think I can say much more at this point. > > There was no backtrace, but you should be able to reproduce by removing the > (require 'cl-extra) from rx.el and bootstrap. > I traced it manually thus (in call order): > > loadup.el: (load "emacs-lisp/cl-generic") > cl-generic.el: toplevel: (cl--generic-prefill-dispatchers 0 (head eql)) > cl-generic.el: cl--generic-prefill-dispatchers: call cl--generic-get-dispatcher > cl-generic.el: cl--generic-get-dispatcher: call byte-compile > bytecomp.el: (require 'compile) > compile.el: (defvar compilation-error-regexp-alist-alist ... (rx...)) > rx.el: call cl-every from call to rx-to-string > Eager macro-expansion failure: (void-function cl-every) Ah, I think I know what's going on. See this part of cl-lib.el: (unless (load "cl-loaddefs" 'noerror 'quiet) ;; When bootstrapping, cl-loaddefs hasn't been built yet! (require 'cl-macs) (require 'cl-seq)) Adding `cl-extra` here would likely fix your problem (of course, using (memq nil (mapcar ...)) is a perfectly fine alternative as well). Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-22 15:53 ` Stefan Monnier @ 2019-05-22 16:40 ` Mattias Engdegård 2019-05-22 19:08 ` Stefan Monnier 2019-05-26 12:05 ` Basil L. Contovounesios 0 siblings, 2 replies; 37+ messages in thread From: Mattias Engdegård @ 2019-05-22 16:40 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel 22 maj 2019 kl. 17.53 skrev Stefan Monnier <monnier@iro.umontreal.ca>: > > Ah, I think I know what's going on. See this part of cl-lib.el: > > (unless (load "cl-loaddefs" 'noerror 'quiet) > ;; When bootstrapping, cl-loaddefs hasn't been built yet! > (require 'cl-macs) > (require 'cl-seq)) > > Adding `cl-extra` here would likely fix your problem No, now the bootstrap just fails earlier: Loading /Users/mattias/emacs/lisp/emacs-lisp/cl-preloaded.el (source)... Eager macro-expansion failure: (void-function help-add-fundoc-usage) Eager macro-expansion failure: (void-function help-add-fundoc-usage) Eager macro-expansion failure: (void-function help-add-fundoc-usage) Eager macro-expansion failure: (void-function help-add-fundoc-usage) Eager macro-expansion failure: (void-function help-add-fundoc-usage) Eager macro-expansion failure: (void-function help-add-fundoc-usage) Eager macro-expansion failure: (void-function help-add-fundoc-usage) Eager macro-expansion failure: (void-function help-add-fundoc-usage) Symbol’s function definition is void: help-add-fundoc-usage I think the (require 'cl-extra) better stay in rx.el until this mess has been sorted out. > (of course, using > (memq nil (mapcar ...)) is a perfectly fine alternative as well). That's what I always have been using, but given that cl-every and cl-some are easier to read, understand, and write, and generally faster, I've sort of changed tack. Maybe we should add `every' and `some' as builtins? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-22 16:40 ` Mattias Engdegård @ 2019-05-22 19:08 ` Stefan Monnier 2019-05-26 12:05 ` Basil L. Contovounesios 1 sibling, 0 replies; 37+ messages in thread From: Stefan Monnier @ 2019-05-22 19:08 UTC (permalink / raw) To: Mattias Engdegård; +Cc: emacs-devel >> Ah, I think I know what's going on. See this part of cl-lib.el: >> >> (unless (load "cl-loaddefs" 'noerror 'quiet) >> ;; When bootstrapping, cl-loaddefs hasn't been built yet! >> (require 'cl-macs) >> (require 'cl-seq)) >> >> Adding `cl-extra` here would likely fix your problem > No, now the bootstrap just fails earlier: Welcome to the world of the bootstrap. > I think the (require 'cl-extra) better stay in rx.el until this mess > has been sorted out. Or refrain from using rx's `or` from bytecomp.el, or refrain from calling byte-compile from cl-generic.el during the bootstrap, or ... >> (of course, using >> (memq nil (mapcar ...)) is a perfectly fine alternative as well). > That's what I always have been using, but given that cl-every and cl-some > are easier to read, understand, and write, and generally faster, I've sort > of changed tack. Either way is fine by me (w.r.t speed, cl-every is usually slower per iteration, so whether it's overall faster or slower depends on how far down the list it has to get before it finds a nil). > Maybe we should add `every' and `some' as builtins? I prefer not to have an opinion on this ;-) Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Bootstrap/autoload policy (was Re: regular expressions that match nothing) 2019-05-22 16:40 ` Mattias Engdegård 2019-05-22 19:08 ` Stefan Monnier @ 2019-05-26 12:05 ` Basil L. Contovounesios 1 sibling, 0 replies; 37+ messages in thread From: Basil L. Contovounesios @ 2019-05-26 12:05 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Stefan Monnier, emacs-devel Mattias Engdegård <mattiase@acm.org> writes: > That's what I always have been using, but given that cl-every and cl-some are > easier to read, understand, and write, and generally faster, I've sort of > changed tack. Maybe we should add `every' and `some' as builtins? FWIW, there exist also seq-some and seq-every-p, but I don't know whether seq.el is any bootstrap-friendlier. -- Basil ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-15 16:21 ` Mattias Engdegård 2019-05-15 19:41 ` Alan Mackenzie 2019-05-15 20:17 ` Michael Heerdegen @ 2019-05-16 18:12 ` Eric Abrahamsen 2019-05-19 4:30 ` 回复: " net june 2 siblings, 1 reply; 37+ messages in thread From: Eric Abrahamsen @ 2019-05-16 18:12 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Stefan Monnier, emacs-devel Mattias Engdegård <mattiase@acm.org> writes: > tis 2019-05-14 klockan 15:41 -0400 skrev Stefan Monnier: > >> Yes, please. I'd recommend a `regexp-` prefix for it. > > Well, since you asked so nicely! > >> [ And I'll carefully avoid having an opinion on the rest of the name. >> ] > > The correct name is obviously something like `regexp-empty', but I have > to concede that it might be misinterpreted. The attached patch uses > `regexp-unmatchable' which is reasonably descriptive. Better > suggestions are welcome. regexp-null? It's also a little cryptic, but it's short! ^ permalink raw reply [flat|nested] 37+ messages in thread
* 回复: regular expressions that match nothing 2019-05-16 18:12 ` regular expressions that match nothing Eric Abrahamsen @ 2019-05-19 4:30 ` net june 2019-05-19 5:00 ` HaiJun Zhang 2019-05-19 14:12 ` 回复: " Drew Adams 0 siblings, 2 replies; 37+ messages in thread From: net june @ 2019-05-19 4:30 UTC (permalink / raw) To: Eric Abrahamsen, Mattias Engdegård Cc: Stefan Monnier, emacs-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 963 bytes --] What about regexp-nothing or re-nothing? ________________________________ 发件人: Emacs-devel <emacs-devel-bounces+netjune=outlook.com@gnu.org> 代表 Eric Abrahamsen <eric@ericabrahamsen.net> 发送时间: 2019年5月16日 18:12 收件人: Mattias Engdegård 抄送: Stefan Monnier; emacs-devel@gnu.org 主题: Re: regular expressions that match nothing Mattias Engdegård <mattiase@acm.org> writes: > tis 2019-05-14 klockan 15:41 -0400 skrev Stefan Monnier: > >> Yes, please. I'd recommend a `regexp-` prefix for it. > > Well, since you asked so nicely! > >> [ And I'll carefully avoid having an opinion on the rest of the name. >> ] > > The correct name is obviously something like `regexp-empty', but I have > to concede that it might be misinterpreted. The attached patch uses > `regexp-unmatchable' which is reasonably descriptive. Better > suggestions are welcome. regexp-null? It's also a little cryptic, but it's short! [-- Attachment #2: Type: text/html, Size: 1805 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-19 4:30 ` 回复: " net june @ 2019-05-19 5:00 ` HaiJun Zhang 2019-05-19 7:32 ` Mattias Engdegård 2019-05-19 14:12 ` 回复: " Drew Adams 1 sibling, 1 reply; 37+ messages in thread From: HaiJun Zhang @ 2019-05-19 5:00 UTC (permalink / raw) To: Emacs developers [-- Attachment #1: Type: text/plain, Size: 106 bytes --] > regexp-null? It's also a little cryptic, but it's short! What about regexp-nothing or re-nothing? [-- Attachment #2: Type: text/html, Size: 642 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-19 5:00 ` HaiJun Zhang @ 2019-05-19 7:32 ` Mattias Engdegård 2019-05-20 7:56 ` philippe schnoebelen 0 siblings, 1 reply; 37+ messages in thread From: Mattias Engdegård @ 2019-05-19 7:32 UTC (permalink / raw) To: HaiJun Zhang; +Cc: Emacs developers 19 maj 2019 kl. 07.00 skrev HaiJun Zhang <netjune@outlook.com>: > > > regexp-null? It's also a little cryptic, but it's short! > > What about regexp-nothing or re-nothing? Those are fair suggestions and close to the mathematical meaning, but could possibly be misunderstood as patterns (only) matching the empty string. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-19 7:32 ` Mattias Engdegård @ 2019-05-20 7:56 ` philippe schnoebelen 2019-05-20 23:19 ` Richard Stallman 0 siblings, 1 reply; 37+ messages in thread From: philippe schnoebelen @ 2019-05-20 7:56 UTC (permalink / raw) To: Mattias Engdegård; +Cc: HaiJun Zhang, Emacs developers [-- Attachment #1: Type: text/plain, Size: 838 bytes --] The (real) risk of misunderstanding exists with all options that include "null", "empty", "void", "nothing", "nada". The name refers to a set of matches but can be understood as qualifying the match itself, as a set of chars. If you teach programming and mark code written by students, you know that the confusion will definitely happen. This is why I much prefer Alan's re-nomatch suggestion. On Sun, 19 May 2019 at 09:33, Mattias Engdegård <mattiase@acm.org> wrote: > 19 maj 2019 kl. 07.00 skrev HaiJun Zhang <netjune@outlook.com>: > > > > > regexp-null? It's also a little cryptic, but it's short! > > > > What about regexp-nothing or re-nothing? > > Those are fair suggestions and close to the mathematical meaning, but > could possibly be misunderstood as patterns (only) matching the empty > string. > > > [-- Attachment #2: Type: text/html, Size: 1301 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: regular expressions that match nothing 2019-05-20 7:56 ` philippe schnoebelen @ 2019-05-20 23:19 ` Richard Stallman 0 siblings, 0 replies; 37+ messages in thread From: Richard Stallman @ 2019-05-20 23:19 UTC (permalink / raw) To: philippe schnoebelen; +Cc: mattiase, netjune, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Maybe these will avoid confusion about matching the empty string: regexp-never regexp-nomatch regexp-dontmatch regexp-fail -- Dr Richard Stallman President, Free Software Foundation (https://gnu.org, https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: 回复: regular expressions that match nothing 2019-05-19 4:30 ` 回复: " net june 2019-05-19 5:00 ` HaiJun Zhang @ 2019-05-19 14:12 ` Drew Adams 1 sibling, 0 replies; 37+ messages in thread From: Drew Adams @ 2019-05-19 14:12 UTC (permalink / raw) To: net june, Eric Abrahamsen, Mattias Engdegård Cc: Stefan Monnier, emacs-devel > What about regexp-nothing or re-nothing? ^^^^^^^^^^ Not bad, considering that the abbreviation "re" means "regarding" or "referencing". So[,] much ado about nothing. ;-) ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2019-05-26 12:05 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-05-14 7:25 regular expressions that match nothing philippe schnoebelen 2019-05-14 10:14 ` Mattias Engdegård 2019-05-14 19:41 ` Stefan Monnier 2019-05-15 16:21 ` Mattias Engdegård 2019-05-15 19:41 ` Alan Mackenzie 2019-05-16 10:54 ` Mattias Engdegård 2019-05-16 23:18 ` Phil Sainty 2019-05-17 9:43 ` Alan Mackenzie 2019-05-17 10:17 ` Mattias Engdegård 2019-05-17 12:53 ` Stefan Monnier 2019-05-15 20:17 ` Michael Heerdegen 2019-05-15 21:06 ` Stefan Monnier 2019-05-15 21:07 ` Mattias Engdegård 2019-05-15 21:38 ` Michael Heerdegen 2019-05-16 6:57 ` More re odditie [Was: regular expressions that match nothing] phs 2019-05-16 9:29 ` Mattias Engdegård 2019-05-16 10:59 ` phs 2019-05-16 12:31 ` Stefan Monnier 2019-05-16 18:35 ` Michael Heerdegen 2019-05-16 20:31 ` Mattias Engdegård 2019-05-16 21:01 ` Global and local definitions of non-functions/variable (was: More re odditie [Was: regular expressions that match nothing]) Stefan Monnier 2019-05-20 16:26 ` Bootstrap/autoload policy (was Re: regular expressions that match nothing) Mattias Engdegård 2019-05-22 14:02 ` Stefan Monnier 2019-05-22 14:07 ` Mattias Engdegård 2019-05-22 14:24 ` Stefan Monnier 2019-05-22 15:06 ` Mattias Engdegård 2019-05-22 15:53 ` Stefan Monnier 2019-05-22 16:40 ` Mattias Engdegård 2019-05-22 19:08 ` Stefan Monnier 2019-05-26 12:05 ` Basil L. Contovounesios 2019-05-16 18:12 ` regular expressions that match nothing Eric Abrahamsen 2019-05-19 4:30 ` 回复: " net june 2019-05-19 5:00 ` HaiJun Zhang 2019-05-19 7:32 ` Mattias Engdegård 2019-05-20 7:56 ` philippe schnoebelen 2019-05-20 23:19 ` Richard Stallman 2019-05-19 14:12 ` 回复: " Drew Adams
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).