From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Scan of regexp mistakes Date: Fri, 8 Mar 2019 09:13:04 -0800 Organization: UCLA Computer Science Department Message-ID: References: <3ef768c2-98d9-a42d-067a-4a5ffc945cf4@cs.ucla.edu> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------6B375B7658124D733DCC9700" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="11745"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 Cc: emacs-devel To: =?UTF-8?Q?Mattias_Engdeg=c3=a5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 08 18:13:31 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h2J3r-0002vd-1Y for ged-emacs-devel@m.gmane.org; Fri, 08 Mar 2019 18:13:31 +0100 Original-Received: from localhost ([127.0.0.1]:47305 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h2J3p-0000IV-N7 for ged-emacs-devel@m.gmane.org; Fri, 08 Mar 2019 12:13:29 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:36529) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h2J3e-0000Gb-U9 for emacs-devel@gnu.org; Fri, 08 Mar 2019 12:13:20 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h2J3c-0004JM-Lw for emacs-devel@gnu.org; Fri, 08 Mar 2019 12:13:18 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:42332) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h2J3c-0004At-AD for emacs-devel@gnu.org; Fri, 08 Mar 2019 12:13:16 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B656A161537; Fri, 8 Mar 2019 09:13:08 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id wQpnJ-CukQf7; Fri, 8 Mar 2019 09:13:05 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9875D161532; Fri, 8 Mar 2019 09:13:05 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id aRcYZAVBx4Xg; Fri, 8 Mar 2019 09:13:05 -0800 (PST) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 6AF131614D1; Fri, 8 Mar 2019 09:13:05 -0800 (PST) Openpgp: preference=signencrypt Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECH In-Reply-To: Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233932 Archived-At: This is a multi-part message in MIME format. --------------6B375B7658124D733DCC9700 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 3/5/19 7:06 AM, Mattias Engdeg=C3=A5rd wrote: > > I can run it periodically but would surely forget. Should I put the tra= wler in the Emacs source tree (if so, where?), in ELPA, or elsewhere? Stefan mentioned one possibility. Though even then I daresay it'd be helpful if you ran it periodically, just as I periodically run admin/merge-gnulib. (If you don't run it, it's likely nobody else will...= .) > - (re-search-forward "^\\(\s+=3D+\s?+\\)+\n") > + (re-search-forward "^\\(\s+=3D+\s+\\)+\n") > ^^^ > Are you sure this shouldn't be `\s*' No, good point. I'll change it to that. See attached patch. > - "[a-z0-9$%(*-=3D?[_][^<>\")!;:,{}\n\t @]*" > + "[a-z$%(*-=3D?[_][^<>\")!;:,{}\n\t @]*" > > You kept the rather odd range `*-=3D' which comprises `*+,-./0123456789= :;<=3D'. Is it supposed to be that way? Goodness knows what is intended here, as this is some ad hoc variant of RFC 5322/2822/822 and I don't know which variant. Might as well spell out the range in a more-conventional way, though. The attached patch replaces *-=3D with *+./=3D (as ,:;< aren't allowed unquoted in RFC 5322 atoms) and puts - and 0-9 elsewhere; this should be closer to what was wanted and should be clearer anyway. I was unable to track down whatever suggestion was made by Felix Wiemann long ago, and so removed that comment (since the regexp no longer matches his suggestion anyway). > - (while (re-search-forward "\\ce[=C2=BB\\.\\?]\\|=C2=AB\\ce" nil t) > + (while (re-search-forward "\\ce[=C2=BB\\.?]\\|=C2=AB\\ce" nil t) > > Should `\' really be kept in the set of characters? It looks like it wa= s only included as an attempt to escape `.' and `?'. Yes, probably. Fixed in the attached. > searching for A-z uncovers more suspect regexps, some of which aren't f= ound by the trawler. I wonder where those all came from? I attempted to fix them in the attach= ed. > Here is another one in the same file (line 33), but that wasn't found b= y the trawler: > > (replace-regexp-in-string "[\000-\032\177<>#%\"{}|\\^[]`%?;]" > > That \032 doesn't look right (number base confusion?), and it looks lik= e it's meant as a single character alternative but it isn't, given the mi= splaced `]'. The regexp has other troubles. It doesn't include !$'()*+,/:@&=3D (all of which are reserved characters according to RFC 3986), and it has duplicate %. The attached patch fixes the % and puts in a FIXME about the other chars. > diff --git a/lisp/org/org-mobile.el b/lisp/org/org-mobile.el > index 1ff6358403..83dcc7b0d1 100644 > --- a/lisp/org/org-mobile.el > +++ b/lisp/org/org-mobile.el > @@ -845,11 +845,11 @@ If BEG and END are given, only do this in that re= gion." > (cl-incf cnt-error) > (throw 'next t)) > (move-marker bos-marker (point)) > - (if (re-search-forward "^** Old value[ \t]*$" eos t) > + (if (re-search-forward "^\\*\\* Old value[ \t]*$" eos t) > > Shouldn't this start with "^\\**", or does it have to be exactly two as= terisks? > > (setq old (buffer-substring > (1+ (match-end 0)) > (progn (outline-next-heading) (point))))) > - (if (re-search-forward "^** New value[ \t]*$" eos t) > + (if (re-search-forward "^\\*\\* New value[ \t]*$" eos t) > > Idem. "\\**" would be safer, yes. Fixed in the attached. > - ((string-match "^//\\(.?*\\)/\\(<.*>\\)$" path) > + ((string-match "^//\\(.*\\)/\\(<.*>\\)$" path) > > Another repetition-of-repetition. Sure it shouldn't be `*?' instead? It= looks likely, since there is a `/' following that would be eaten by the = `.*' given half a chance. The comment on the next line says "Planner has the id after the final slash", which implies that the first .* should indeed be greedy. > > diff --git a/lisp/progmodes/fortran.el b/lisp/progmodes/fortran.el > index be272c0922..c1a267f4c5 100644 > --- a/lisp/progmodes/fortran.el > +++ b/lisp/progmodes/fortran.el > @@ -2052,7 +2052,7 @@ If ALL is nil, only match comments that start in = column > 0." > (when (<=3D (point) bos) > (move-to-column (1+ fill-column)) > ;; What is this doing??? > - (or (re-search-forward "[\t\n,'+-/*)=3D]" eol t) > + (or (re-search-forward "[-\t\n,'+./*)=3D]" eol t) > > Where did the . come from? Don't you think that `+-/*' were meant to in= clude those four symbols only? I couldn't figure out what the code was doing (note the comment...) so decided to preserve the semantics of the old regexp. But you're right, "." is likely not intended there. I removed it in the attached. > ;; Regexp bug in XEmacs disallows ][ inside [], and wants + last > - "\\s-*\\.\\(\\([a-zA-Z0-9`_$+@^.*?|---]\\|[][]\\|\\\\[()|]\\)+\\)\\s= -*(\\(.*\\))\\s-*\\(,\\|)\\s-*;\\)") > + "\\s-*\\.\\(\\([-a-zA-Z0-9`_$+@^.*?]\\|[][]\\|\\\\[()|]\\)+\\)\\s-*(= \\(.*\\))\\s-*\\(,\\|)\\s-*;\\)") > (setq rep (match-string-no-properties 3)) > (goto-char (match-end 0)) > (setq tpl-wild-list > > Are you sure that | shouldn't be there too? Or is this some kind of XEm= acs idiom? > You're right. Wilson Snyder later stepped in and fixed that string. Nice to have a real expert in the house. --------------6B375B7658124D733DCC9700 Content-Type: text/x-patch; name="0001-More-regexp-corrections-and-tweaks.patch" Content-Disposition: attachment; filename="0001-More-regexp-corrections-and-tweaks.patch" Content-Transfer-Encoding: quoted-printable >From 03d2da31d311def9773ede09b6ad89b094c61805 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 8 Mar 2019 09:08:46 -0800 Subject: [PATCH] More regexp corrections and tweaks MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit >From suggestions by Mattias Engdeg=C3=A5rd in: https://lists.gnu.org/r/emacs-devel/2019-03/msg00131.html * lisp/arc-mode.el (archive-rar-summarize): * lisp/gnus/gnus-art.el (gnus-button-valid-localpart-regexp): * lisp/language/ethio-util.el (ethio-fidel-to-tex-buffer): * lisp/nxml/rng-uri.el (rng-file-name-uri): * lisp/org/org-mobile.el (org-mobile-apply): * lisp/progmodes/cperl-mode.el (cperl-init-faces): * lisp/progmodes/fortran.el (fortran-fill): * lisp/progmodes/mantemp.el (mantemp-remove-comments) (mantemp-remove-memfuncs, mantemp-insert-cxx-syntax): * lisp/speedbar.el (speedbar-directory-buttons-follow): * lisp/vc/add-log.el (change-log-font-lock-keywords): Fix more regular expressions that seem to be typos or infelicities. --- lisp/arc-mode.el | 2 +- lisp/gnus/gnus-art.el | 3 +-- lisp/language/ethio-util.el | 2 +- lisp/nxml/rng-uri.el | 7 ++++--- lisp/org/org-mobile.el | 4 ++-- lisp/progmodes/cperl-mode.el | 4 ++-- lisp/progmodes/fortran.el | 2 +- lisp/progmodes/mantemp.el | 8 ++++---- lisp/speedbar.el | 2 +- lisp/vc/add-log.el | 2 +- 10 files changed, 18 insertions(+), 18 deletions(-) diff --git a/lisp/arc-mode.el b/lisp/arc-mode.el index 2afde7ee75..6a58d61a54 100644 --- a/lisp/arc-mode.el +++ b/lisp/arc-mode.el @@ -2016,7 +2016,7 @@ archive-rar-summarize (call-process "lsar" nil t nil "-l" (or file copy)) (if copy (delete-file copy)) (goto-char (point-min)) - (re-search-forward "^\\(\s+=3D+\s+\\)+\n") + (re-search-forward "^\\(\s+=3D+\s*\\)+\n") (while (looking-at (concat "^\s+[0-9.]+\s+D?-+\s+" ; Flags "\\([0-9-]+\\)\s+" ; Size "\\([-0-9.%]+\\|-+\\)\s+" ; Ratio diff --git a/lisp/gnus/gnus-art.el b/lisp/gnus/gnus-art.el index fa3abfac58..baf44cb483 100644 --- a/lisp/gnus/gnus-art.el +++ b/lisp/gnus/gnus-art.el @@ -7376,9 +7376,8 @@ gnus-button-valid-fqdn-regexp :group 'gnus-article-buttons :type 'regexp) =20 -;; Regexp suggested by Felix Wiemann in <87oeuomcz9.fsf@news2.ososo.de> (defcustom gnus-button-valid-localpart-regexp - "[a-z$%(*-=3D?[_][^<>\")!;:,{}\n\t @]*" + "[-a-z0-9$%(*+./=3D?[_][^<>\")!;:,{}\n\t @]*" "Regular expression that matches a localpart of mail addresses or MIDs= ." :version "22.1" :group 'gnus-article-buttons diff --git a/lisp/language/ethio-util.el b/lisp/language/ethio-util.el index 512d49b9c5..04b15ddd9a 100644 --- a/lisp/language/ethio-util.el +++ b/lisp/language/ethio-util.el @@ -804,7 +804,7 @@ ethio-fidel-to-tex-buffer =20 ;; Special Ethiopic punctuation. (goto-char (point-min)) - (while (re-search-forward "\\ce[=C2=BB\\.?]\\|=C2=AB\\ce" nil t) + (while (re-search-forward "\\ce[=C2=BB.?]\\|=C2=AB\\ce" nil t) (cond ((=3D (setq ch (preceding-char)) ?\=C2=BB) (delete-char -1) diff --git a/lisp/nxml/rng-uri.el b/lisp/nxml/rng-uri.el index d8f2884f5e..798475bbc3 100644 --- a/lisp/nxml/rng-uri.el +++ b/lisp/nxml/rng-uri.el @@ -30,9 +30,10 @@ rng-file-name-uri escape them using %HH." (setq f (expand-file-name f)) (let ((url - (replace-regexp-in-string "[\000-\032\177<>#%\"{}|\\^[]`%?;]" - 'rng-percent-encode - f))) + ;; FIXME. Explain why the pattern doesn't also have "!$&'()*+,/:@=3D". + ;; See Internet RFC 3986 section 2.2. + (replace-regexp-in-string "[]\0-\s\"#%;<>?[\\^`{|}\177]" + 'rng-percent-encode f))) (concat "file:" (if (and (> (length url) 0) (=3D (aref url 0) ?/)) diff --git a/lisp/org/org-mobile.el b/lisp/org/org-mobile.el index 83dcc7b0d1..8b4e895388 100644 --- a/lisp/org/org-mobile.el +++ b/lisp/org/org-mobile.el @@ -845,11 +845,11 @@ org-mobile-apply (cl-incf cnt-error) (throw 'next t)) (move-marker bos-marker (point)) - (if (re-search-forward "^\\*\\* Old value[ \t]*$" eos t) + (if (re-search-forward "^\\** Old value[ \t]*$" eos t) (setq old (buffer-substring (1+ (match-end 0)) (progn (outline-next-heading) (point))))) - (if (re-search-forward "^\\*\\* New value[ \t]*$" eos t) + (if (re-search-forward "^\\** New value[ \t]*$" eos t) (setq new (buffer-substring (1+ (match-end 0)) (progn (outline-next-heading) diff --git a/lisp/progmodes/cperl-mode.el b/lisp/progmodes/cperl-mode.el index 0fe4b106c5..a9402e17a9 100644 --- a/lisp/progmodes/cperl-mode.el +++ b/lisp/progmodes/cperl-mode.el @@ -5736,9 +5736,9 @@ cperl-init-faces (if (eq (char-after (cperl-1- (match-end 0))) ?\{ ) 'font-lock-function-name-face 'font-lock-variable-name-face)))) - '("\\<\\(package\\|require\\|use\\|import\\|no\\|bootstrap\\)[ \t]+= \\([a-zA-z_][a-zA-z_0-9:]*\\)[ \t;]" ; require A if B; + '("\\<\\(package\\|require\\|use\\|import\\|no\\|bootstrap\\)[ \t]+= \\([a-zA-Z_][a-zA-Z_0-9:]*\\)[ \t;]" ; require A if B; 2 font-lock-function-name-face) - '("^[ \t]*format[ \t]+\\([a-zA-z_][a-zA-z_0-9:]*\\)[ \t]*=3D[ \t]*$= " + '("^[ \t]*format[ \t]+\\([a-zA-Z_][a-zA-Z_0-9:]*\\)[ \t]*=3D[ \t]*$= " 1 font-lock-function-name-face) (cond ((featurep 'font-lock-extra) '("\\([]}\\\\%@>*&]\\|\\$[a-zA-Z0-9_:]*\\)[ \t]*{[ \t]*\\(-?[a-zA-Z= 0-9_:]+\\)[ \t]*}" diff --git a/lisp/progmodes/fortran.el b/lisp/progmodes/fortran.el index c1a267f4c5..b8aa521cf6 100644 --- a/lisp/progmodes/fortran.el +++ b/lisp/progmodes/fortran.el @@ -2052,7 +2052,7 @@ fortran-fill (when (<=3D (point) bos) (move-to-column (1+ fill-column)) ;; What is this doing??? - (or (re-search-forward "[-\t\n,'+./*)=3D]" eol t) + (or (re-search-forward "[-\t\n,'+/*)=3D]" eol t) (goto-char bol))) (if (bolp) (re-search-forward "[ \t]" opoint t)) diff --git a/lisp/progmodes/mantemp.el b/lisp/progmodes/mantemp.el index 9beeb4aae6..4190a84727 100644 --- a/lisp/progmodes/mantemp.el +++ b/lisp/progmodes/mantemp.el @@ -89,7 +89,7 @@ mantemp-remove-comments (save-excursion (goto-char (point-min)) (message "Removing comments") - (while (re-search-forward "^[A-z.()+0-9: ]*`\\|'.*$" nil t) + (while (re-search-forward "^[a-zA-Z.()+0-9: ]*`\\|'.*$" nil t) (replace-match "")))) =20 (defun mantemp-remove-memfuncs () @@ -99,14 +99,14 @@ mantemp-remove-memfuncs (goto-char (point-min)) (message "Removing member function extensions") (while (re-search-forward - "^[A-z :&*<>~=3D,0-9+]*>::operator " nil t nil) + "^[a-zA-Z :&*<>~=3D,0-9+]*>::operator " nil t nil) (progn (backward-char 11) (delete-region (point) (line-end-position)))) ;; Remove other member function extensions. (goto-char (point-min)) (message "Removing member function extensions") - (while (re-search-forward "^[A-z :&*<>~=3D,0-9+]*>::" nil t nil) + (while (re-search-forward "^[a-zA-Z :&*<>~=3D,0-9+]*>::" nil t nil) (progn (backward-char 2) (delete-region (point) (line-end-position)))))) @@ -154,7 +154,7 @@ mantemp-insert-cxx-syntax (goto-char (point-min)) (message "Inserting 'template' for functions") (while (re-search-forward - "^template class [A-z :&*<>~=3D,0-9+!]*(" nil t nil) + "^template class [a-zA-Z :&*<>~=3D,0-9+!]*(" nil t nil) (progn (beginning-of-line) (forward-word-strictly 1) diff --git a/lisp/speedbar.el b/lisp/speedbar.el index 46b3f2ea90..a7fd564e94 100644 --- a/lisp/speedbar.el +++ b/lisp/speedbar.el @@ -3388,7 +3388,7 @@ speedbar-directory-buttons-follow "Speedbar click handler for default directory buttons. TEXT is the button clicked on. TOKEN is the directory to follow. INDENT is the current indentation level and is unused." - (if (string-match "^[A-z]:$" token) + (if (string-match "^[A-Za-z]:$" token) (setq default-directory (concat token "/")) (setq default-directory token)) ;; Because we leave speedbar as the current buffer, diff --git a/lisp/vc/add-log.el b/lisp/vc/add-log.el index 9fe06bbf52..f9efd44c5c 100644 --- a/lisp/vc/add-log.el +++ b/lisp/vc/add-log.el @@ -239,7 +239,7 @@ change-log-font-lock-keywords ;; wrongly with a non-date line existing as a random note. In ;; addition, using any kind of fixed setting like this doesn't ;; work if a user customizes add-log-time-format. - ("^[0-9-]+ +\\|^ \\{11,\\}\\|^\t \\{3,\\}\\|^\\(Sun\\|Mon\\|Tue\\|We= d\\|Thu\\|Fri\\|Sat\\) [A-z][a-z][a-z] [0-9:+ ]+" + ("^[0-9-]+ +\\|^ \\{11,\\}\\|^\t \\{3,\\}\\|^\\(Sun\\|Mon\\|Tue\\|We= d\\|Thu\\|Fri\\|Sat\\) [A-Z][a-z][a-z] [0-9:+ ]+" (0 'change-log-date) ;; Name and e-mail; some people put e-mail in parens, not angles. ("\\([^<(]+?\\)[ \t]*[(<]\\([A-Za-z0-9_.+-]+@[A-Za-z0-9_.-]+\\)[>)]= " nil nil --=20 2.20.1 --------------6B375B7658124D733DCC9700--