From: Paul Eggert <eggert@cs.ucla.edu>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Scan of regexp mistakes
Date: Fri, 8 Mar 2019 09:13:04 -0800 [thread overview]
Message-ID: <e2c83a03-a6ba-5b44-b49e-d2c4da447213@cs.ucla.edu> (raw)
In-Reply-To: <D3E6A383-1368-46F0-8D37-7C45FD322DDC@acm.org>
[-- Attachment #1: Type: text/plain, Size: 5271 bytes --]
On 3/5/19 7:06 AM, Mattias Engdegård wrote:
>
> I can run it periodically but would surely forget. Should I put the trawler in the Emacs source tree (if so, where?), in ELPA, or elsewhere?
Stefan mentioned one possibility. Though even then I daresay it'd be
helpful if you ran it periodically, just as I periodically run
admin/merge-gnulib. (If you don't run it, it's likely nobody else will....)
> - (re-search-forward "^\\(\s+=+\s?+\\)+\n")
> + (re-search-forward "^\\(\s+=+\s+\\)+\n")
> ^^^
> Are you sure this shouldn't be `\s*'
No, good point. I'll change it to that. See attached patch.
> - "[a-z0-9$%(*-=?[_][^<>\")!;:,{}\n\t @]*"
> + "[a-z$%(*-=?[_][^<>\")!;:,{}\n\t @]*"
>
> You kept the rather odd range `*-=' which comprises `*+,-./0123456789:;<='. Is it supposed to be that way?
Goodness knows what is intended here, as this is some ad hoc variant of
RFC 5322/2822/822 and I don't know which variant. Might as well spell
out the range in a more-conventional way, though. The attached patch
replaces *-= with *+./= (as ,:;< aren't allowed unquoted in RFC 5322
atoms) and puts - and 0-9 elsewhere; this should be closer to what was
wanted and should be clearer anyway. I was unable to track down whatever
suggestion was made by Felix Wiemann long ago, and so removed that
comment (since the regexp no longer matches his suggestion anyway).
> - (while (re-search-forward "\\ce[»\\.\\?]\\|«\\ce" nil t)
> + (while (re-search-forward "\\ce[»\\.?]\\|«\\ce" nil t)
>
> Should `\' really be kept in the set of characters? It looks like it was only included as an attempt to escape `.' and `?'.
Yes, probably. Fixed in the attached.
> searching for A-z uncovers more suspect regexps, some of which aren't found by the trawler.
I wonder where those all came from? I attempted to fix them in the attached.
> Here is another one in the same file (line 33), but that wasn't found by the trawler:
>
> (replace-regexp-in-string "[\000-\032\177<>#%\"{}|\\^[]`%?;]"
>
> That \032 doesn't look right (number base confusion?), and it looks like it's meant as a single character alternative but it isn't, given the misplaced `]'.
The regexp has other troubles. It doesn't include !$'()*+,/:@&= (all of
which are reserved characters according to RFC 3986), and it has
duplicate %. The attached patch fixes the % and puts in a FIXME about
the other chars.
> diff --git a/lisp/org/org-mobile.el b/lisp/org/org-mobile.el
> index 1ff6358403..83dcc7b0d1 100644
> --- a/lisp/org/org-mobile.el
> +++ b/lisp/org/org-mobile.el
> @@ -845,11 +845,11 @@ If BEG and END are given, only do this in that region."
> (cl-incf cnt-error)
> (throw 'next t))
> (move-marker bos-marker (point))
> - (if (re-search-forward "^** Old value[ \t]*$" eos t)
> + (if (re-search-forward "^\\*\\* Old value[ \t]*$" eos t)
>
> Shouldn't this start with "^\\**", or does it have to be exactly two asterisks?
>
> (setq old (buffer-substring
> (1+ (match-end 0))
> (progn (outline-next-heading) (point)))))
> - (if (re-search-forward "^** New value[ \t]*$" eos t)
> + (if (re-search-forward "^\\*\\* New value[ \t]*$" eos t)
>
> Idem.
"\\**" would be safer, yes. Fixed in the attached.
> - ((string-match "^//\\(.?*\\)/\\(<.*>\\)$" path)
> + ((string-match "^//\\(.*\\)/\\(<.*>\\)$" path)
>
> Another repetition-of-repetition. Sure it shouldn't be `*?' instead? It looks likely, since there is a `/' following that would be eaten by the `.*' given half a chance.
The comment on the next line says "Planner has the id after the final
slash", which implies that the first .* should indeed be greedy.
>
> diff --git a/lisp/progmodes/fortran.el b/lisp/progmodes/fortran.el
> index be272c0922..c1a267f4c5 100644
> --- a/lisp/progmodes/fortran.el
> +++ b/lisp/progmodes/fortran.el
> @@ -2052,7 +2052,7 @@ If ALL is nil, only match comments that start in column > 0."
> (when (<= (point) bos)
> (move-to-column (1+ fill-column))
> ;; What is this doing???
> - (or (re-search-forward "[\t\n,'+-/*)=]" eol t)
> + (or (re-search-forward "[-\t\n,'+./*)=]" eol t)
>
> Where did the . come from? Don't you think that `+-/*' were meant to include those four symbols only?
I couldn't figure out what the code was doing (note the comment...) so
decided to preserve the semantics of the old regexp. But you're right,
"." is likely not intended there. I removed it in the attached.
> ;; Regexp bug in XEmacs disallows ][ inside [], and wants + last
> - "\\s-*\\.\\(\\([a-zA-Z0-9`_$+@^.*?|---]\\|[][]\\|\\\\[()|]\\)+\\)\\s-*(\\(.*\\))\\s-*\\(,\\|)\\s-*;\\)")
> + "\\s-*\\.\\(\\([-a-zA-Z0-9`_$+@^.*?]\\|[][]\\|\\\\[()|]\\)+\\)\\s-*(\\(.*\\))\\s-*\\(,\\|)\\s-*;\\)")
> (setq rep (match-string-no-properties 3))
> (goto-char (match-end 0))
> (setq tpl-wild-list
>
> Are you sure that | shouldn't be there too? Or is this some kind of XEmacs idiom?
>
You're right. Wilson Snyder later stepped in and fixed that string. Nice
to have a real expert in the house.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-More-regexp-corrections-and-tweaks.patch --]
[-- Type: text/x-patch; name="0001-More-regexp-corrections-and-tweaks.patch", Size: 9246 bytes --]
From 03d2da31d311def9773ede09b6ad89b094c61805 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 8 Mar 2019 09:08:46 -0800
Subject: [PATCH] More regexp corrections and tweaks
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From suggestions by Mattias Engdegård in:
https://lists.gnu.org/r/emacs-devel/2019-03/msg00131.html
* lisp/arc-mode.el (archive-rar-summarize):
* lisp/gnus/gnus-art.el (gnus-button-valid-localpart-regexp):
* lisp/language/ethio-util.el (ethio-fidel-to-tex-buffer):
* lisp/nxml/rng-uri.el (rng-file-name-uri):
* lisp/org/org-mobile.el (org-mobile-apply):
* lisp/progmodes/cperl-mode.el (cperl-init-faces):
* lisp/progmodes/fortran.el (fortran-fill):
* lisp/progmodes/mantemp.el (mantemp-remove-comments)
(mantemp-remove-memfuncs, mantemp-insert-cxx-syntax):
* lisp/speedbar.el (speedbar-directory-buttons-follow):
* lisp/vc/add-log.el (change-log-font-lock-keywords):
Fix more regular expressions that seem to be typos or infelicities.
---
lisp/arc-mode.el | 2 +-
lisp/gnus/gnus-art.el | 3 +--
lisp/language/ethio-util.el | 2 +-
lisp/nxml/rng-uri.el | 7 ++++---
lisp/org/org-mobile.el | 4 ++--
lisp/progmodes/cperl-mode.el | 4 ++--
lisp/progmodes/fortran.el | 2 +-
lisp/progmodes/mantemp.el | 8 ++++----
lisp/speedbar.el | 2 +-
lisp/vc/add-log.el | 2 +-
10 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/lisp/arc-mode.el b/lisp/arc-mode.el
index 2afde7ee75..6a58d61a54 100644
--- a/lisp/arc-mode.el
+++ b/lisp/arc-mode.el
@@ -2016,7 +2016,7 @@ archive-rar-summarize
(call-process "lsar" nil t nil "-l" (or file copy))
(if copy (delete-file copy))
(goto-char (point-min))
- (re-search-forward "^\\(\s+=+\s+\\)+\n")
+ (re-search-forward "^\\(\s+=+\s*\\)+\n")
(while (looking-at (concat "^\s+[0-9.]+\s+D?-+\s+" ; Flags
"\\([0-9-]+\\)\s+" ; Size
"\\([-0-9.%]+\\|-+\\)\s+" ; Ratio
diff --git a/lisp/gnus/gnus-art.el b/lisp/gnus/gnus-art.el
index fa3abfac58..baf44cb483 100644
--- a/lisp/gnus/gnus-art.el
+++ b/lisp/gnus/gnus-art.el
@@ -7376,9 +7376,8 @@ gnus-button-valid-fqdn-regexp
:group 'gnus-article-buttons
:type 'regexp)
-;; Regexp suggested by Felix Wiemann in <87oeuomcz9.fsf@news2.ososo.de>
(defcustom gnus-button-valid-localpart-regexp
- "[a-z$%(*-=?[_][^<>\")!;:,{}\n\t @]*"
+ "[-a-z0-9$%(*+./=?[_][^<>\")!;:,{}\n\t @]*"
"Regular expression that matches a localpart of mail addresses or MIDs."
:version "22.1"
:group 'gnus-article-buttons
diff --git a/lisp/language/ethio-util.el b/lisp/language/ethio-util.el
index 512d49b9c5..04b15ddd9a 100644
--- a/lisp/language/ethio-util.el
+++ b/lisp/language/ethio-util.el
@@ -804,7 +804,7 @@ ethio-fidel-to-tex-buffer
;; Special Ethiopic punctuation.
(goto-char (point-min))
- (while (re-search-forward "\\ce[»\\.?]\\|«\\ce" nil t)
+ (while (re-search-forward "\\ce[».?]\\|«\\ce" nil t)
(cond
((= (setq ch (preceding-char)) ?\»)
(delete-char -1)
diff --git a/lisp/nxml/rng-uri.el b/lisp/nxml/rng-uri.el
index d8f2884f5e..798475bbc3 100644
--- a/lisp/nxml/rng-uri.el
+++ b/lisp/nxml/rng-uri.el
@@ -30,9 +30,10 @@ rng-file-name-uri
escape them using %HH."
(setq f (expand-file-name f))
(let ((url
- (replace-regexp-in-string "[\000-\032\177<>#%\"{}|\\^[]`%?;]"
- 'rng-percent-encode
- f)))
+ ;; FIXME. Explain why the pattern doesn't also have "!$&'()*+,/:@=".
+ ;; See Internet RFC 3986 section 2.2.
+ (replace-regexp-in-string "[]\0-\s\"#%;<>?[\\^`{|}\177]"
+ 'rng-percent-encode f)))
(concat "file:"
(if (and (> (length url) 0)
(= (aref url 0) ?/))
diff --git a/lisp/org/org-mobile.el b/lisp/org/org-mobile.el
index 83dcc7b0d1..8b4e895388 100644
--- a/lisp/org/org-mobile.el
+++ b/lisp/org/org-mobile.el
@@ -845,11 +845,11 @@ org-mobile-apply
(cl-incf cnt-error)
(throw 'next t))
(move-marker bos-marker (point))
- (if (re-search-forward "^\\*\\* Old value[ \t]*$" eos t)
+ (if (re-search-forward "^\\** Old value[ \t]*$" eos t)
(setq old (buffer-substring
(1+ (match-end 0))
(progn (outline-next-heading) (point)))))
- (if (re-search-forward "^\\*\\* New value[ \t]*$" eos t)
+ (if (re-search-forward "^\\** New value[ \t]*$" eos t)
(setq new (buffer-substring
(1+ (match-end 0))
(progn (outline-next-heading)
diff --git a/lisp/progmodes/cperl-mode.el b/lisp/progmodes/cperl-mode.el
index 0fe4b106c5..a9402e17a9 100644
--- a/lisp/progmodes/cperl-mode.el
+++ b/lisp/progmodes/cperl-mode.el
@@ -5736,9 +5736,9 @@ cperl-init-faces
(if (eq (char-after (cperl-1- (match-end 0))) ?\{ )
'font-lock-function-name-face
'font-lock-variable-name-face))))
- '("\\<\\(package\\|require\\|use\\|import\\|no\\|bootstrap\\)[ \t]+\\([a-zA-z_][a-zA-z_0-9:]*\\)[ \t;]" ; require A if B;
+ '("\\<\\(package\\|require\\|use\\|import\\|no\\|bootstrap\\)[ \t]+\\([a-zA-Z_][a-zA-Z_0-9:]*\\)[ \t;]" ; require A if B;
2 font-lock-function-name-face)
- '("^[ \t]*format[ \t]+\\([a-zA-z_][a-zA-z_0-9:]*\\)[ \t]*=[ \t]*$"
+ '("^[ \t]*format[ \t]+\\([a-zA-Z_][a-zA-Z_0-9:]*\\)[ \t]*=[ \t]*$"
1 font-lock-function-name-face)
(cond ((featurep 'font-lock-extra)
'("\\([]}\\\\%@>*&]\\|\\$[a-zA-Z0-9_:]*\\)[ \t]*{[ \t]*\\(-?[a-zA-Z0-9_:]+\\)[ \t]*}"
diff --git a/lisp/progmodes/fortran.el b/lisp/progmodes/fortran.el
index c1a267f4c5..b8aa521cf6 100644
--- a/lisp/progmodes/fortran.el
+++ b/lisp/progmodes/fortran.el
@@ -2052,7 +2052,7 @@ fortran-fill
(when (<= (point) bos)
(move-to-column (1+ fill-column))
;; What is this doing???
- (or (re-search-forward "[-\t\n,'+./*)=]" eol t)
+ (or (re-search-forward "[-\t\n,'+/*)=]" eol t)
(goto-char bol)))
(if (bolp)
(re-search-forward "[ \t]" opoint t))
diff --git a/lisp/progmodes/mantemp.el b/lisp/progmodes/mantemp.el
index 9beeb4aae6..4190a84727 100644
--- a/lisp/progmodes/mantemp.el
+++ b/lisp/progmodes/mantemp.el
@@ -89,7 +89,7 @@ mantemp-remove-comments
(save-excursion
(goto-char (point-min))
(message "Removing comments")
- (while (re-search-forward "^[A-z.()+0-9: ]*`\\|'.*$" nil t)
+ (while (re-search-forward "^[a-zA-Z.()+0-9: ]*`\\|'.*$" nil t)
(replace-match ""))))
(defun mantemp-remove-memfuncs ()
@@ -99,14 +99,14 @@ mantemp-remove-memfuncs
(goto-char (point-min))
(message "Removing member function extensions")
(while (re-search-forward
- "^[A-z :&*<>~=,0-9+]*>::operator " nil t nil)
+ "^[a-zA-Z :&*<>~=,0-9+]*>::operator " nil t nil)
(progn
(backward-char 11)
(delete-region (point) (line-end-position))))
;; Remove other member function extensions.
(goto-char (point-min))
(message "Removing member function extensions")
- (while (re-search-forward "^[A-z :&*<>~=,0-9+]*>::" nil t nil)
+ (while (re-search-forward "^[a-zA-Z :&*<>~=,0-9+]*>::" nil t nil)
(progn
(backward-char 2)
(delete-region (point) (line-end-position))))))
@@ -154,7 +154,7 @@ mantemp-insert-cxx-syntax
(goto-char (point-min))
(message "Inserting 'template' for functions")
(while (re-search-forward
- "^template class [A-z :&*<>~=,0-9+!]*(" nil t nil)
+ "^template class [a-zA-Z :&*<>~=,0-9+!]*(" nil t nil)
(progn
(beginning-of-line)
(forward-word-strictly 1)
diff --git a/lisp/speedbar.el b/lisp/speedbar.el
index 46b3f2ea90..a7fd564e94 100644
--- a/lisp/speedbar.el
+++ b/lisp/speedbar.el
@@ -3388,7 +3388,7 @@ speedbar-directory-buttons-follow
"Speedbar click handler for default directory buttons.
TEXT is the button clicked on. TOKEN is the directory to follow.
INDENT is the current indentation level and is unused."
- (if (string-match "^[A-z]:$" token)
+ (if (string-match "^[A-Za-z]:$" token)
(setq default-directory (concat token "/"))
(setq default-directory token))
;; Because we leave speedbar as the current buffer,
diff --git a/lisp/vc/add-log.el b/lisp/vc/add-log.el
index 9fe06bbf52..f9efd44c5c 100644
--- a/lisp/vc/add-log.el
+++ b/lisp/vc/add-log.el
@@ -239,7 +239,7 @@ change-log-font-lock-keywords
;; wrongly with a non-date line existing as a random note. In
;; addition, using any kind of fixed setting like this doesn't
;; work if a user customizes add-log-time-format.
- ("^[0-9-]+ +\\|^ \\{11,\\}\\|^\t \\{3,\\}\\|^\\(Sun\\|Mon\\|Tue\\|Wed\\|Thu\\|Fri\\|Sat\\) [A-z][a-z][a-z] [0-9:+ ]+"
+ ("^[0-9-]+ +\\|^ \\{11,\\}\\|^\t \\{3,\\}\\|^\\(Sun\\|Mon\\|Tue\\|Wed\\|Thu\\|Fri\\|Sat\\) [A-Z][a-z][a-z] [0-9:+ ]+"
(0 'change-log-date)
;; Name and e-mail; some people put e-mail in parens, not angles.
("\\([^<(]+?\\)[ \t]*[(<]\\([A-Za-z0-9_.+-]+@[A-Za-z0-9_.-]+\\)[>)]" nil nil
--
2.20.1
next prev parent reply other threads:[~2019-03-08 17:13 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-03 18:01 Scan of regexp mistakes Mattias Engdegård
2019-03-05 2:04 ` Paul Eggert
2019-03-05 4:30 ` Clément Pit-Claudel
2019-03-05 15:06 ` Mattias Engdegård
2019-03-08 6:02 ` Stefan Monnier
2019-03-08 17:13 ` Paul Eggert [this message]
[not found] ` <1c44342f-01ab-1bc6-4b5c-3485a6f01fda@lanl.gov>
[not found] ` <61035fed-564a-8766-8982-0d05acea341c@cs.ucla.edu>
2019-03-08 20:46 ` bug#34789: " Davis Herring
2019-03-11 14:20 ` Eli Zaretskii
2019-03-09 12:36 ` Mattias Engdegård
2019-03-09 17:14 ` Paul Eggert
2019-03-09 13:09 ` Mattias Engdegård
2019-03-09 17:12 ` Paul Eggert
2019-03-09 20:16 ` Mattias Engdegård
[not found] <20190305112504.D97DD1EC@emma.svaha.wsnyder.org>
2019-03-05 15:35 ` Wilson Snyder
2019-03-05 16:12 ` Mattias Engdegård
2019-03-07 17:39 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2c83a03-a6ba-5b44-b49e-d2c4da447213@cs.ucla.edu \
--to=eggert@cs.ucla.edu \
--cc=emacs-devel@gnu.org \
--cc=mattiase@acm.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.