From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Regexp scan of Emacs (April 19) Date: Fri, 19 Apr 2019 09:04:55 -0700 Organization: UCLA Computer Science Department Message-ID: <031ec3d9-a8ad-e656-b0ac-465ec00a285d@cs.ucla.edu> References: <90232AC2-3228-4C8F-AD84-FFB6A30F51AF@acm.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------43E7B7ACB7295C8FBC242E31" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="18068"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 Cc: Emacs developers To: =?UTF-8?Q?Mattias_Engdeg=c3=a5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Apr 19 18:06:28 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hHW1z-0004Ug-C5 for ged-emacs-devel@m.gmane.org; Fri, 19 Apr 2019 18:06:27 +0200 Original-Received: from localhost ([127.0.0.1]:58592 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hHW1y-00080P-7v for ged-emacs-devel@m.gmane.org; Fri, 19 Apr 2019 12:06:26 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:49205) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hHW1q-0007zc-Kg for emacs-devel@gnu.org; Fri, 19 Apr 2019 12:06:21 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hHW1X-00084d-Gt for emacs-devel@gnu.org; Fri, 19 Apr 2019 12:06:09 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:38726) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hHW1Q-00075v-PW for emacs-devel@gnu.org; Fri, 19 Apr 2019 12:05:53 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9ABC81616EA; Fri, 19 Apr 2019 09:04:57 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 6CO0UcCeh2W6; Fri, 19 Apr 2019 09:04:56 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 300B51616ED; Fri, 19 Apr 2019 09:04:56 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id HSEs50ickHN5; Fri, 19 Apr 2019 09:04:56 -0700 (PDT) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1260F160E4A; Fri, 19 Apr 2019 09:04:56 -0700 (PDT) Openpgp: preference=signencrypt Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECH In-Reply-To: <90232AC2-3228-4C8F-AD84-FFB6A30F51AF@acm.org> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:235664 Archived-At: This is a multi-part message in MIME format. --------------43E7B7ACB7295C8FBC242E31 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 4/19/19 2:39 AM, Mattias Engdeg=C3=A5rd wrote: > This is the latest scan of errors and oddities in regexps in the Emacs = source tree. > New this time is an experimental check for branch subsumption: whether = one branch in an or-expression matches a superset of another, like "[ab]\= \|a". Please tell me if you believe this might be useful, so that I know = whether to include it in the next release of xr. Thanks, these all look useful to me. I installed the attached patch to fix the glitches. --------------43E7B7ACB7295C8FBC242E31 Content-Type: text/x-patch; name="0001-Fix-regexp-branches-that-subsume-other-branches.patch" Content-Disposition: attachment; filename*0="0001-Fix-regexp-branches-that-subsume-other-branches.patch" Content-Transfer-Encoding: quoted-printable >From 872ec904253e2399bcf772f7995c363ca0f8a262 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 19 Apr 2019 09:00:04 -0700 Subject: [PATCH] Fix regexp branches that subsume other branches MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit Problems reported by Mattias Engdeg=C3=A5rd in: https://lists.gnu.org/r/emacs-devel/2019-04/msg00803.html * lisp/arc-mode.el (archive-rar-summarize): * lisp/eshell/em-hist.el (eshell-hist-word-designator): * lisp/info.el (Info-dir-remove-duplicates): * lisp/international/ja-dic-cnv.el (skkdic-convert-postfix) (skkdic-convert-prefix, skkdic-collect-okuri-nasi): * lisp/progmodes/cc-awk.el (c-awk-esc-pair-re): * lisp/xml.el (xml-att-type-re): Omit regexp branches that subsume other branches. * lisp/progmodes/cperl-mode.el (cperl-beautify-regexp-piece): $ and ^ aren=E2=80=99t simple-codes. --- lisp/arc-mode.el | 2 +- lisp/eshell/em-hist.el | 2 +- lisp/info.el | 2 +- lisp/international/ja-dic-cnv.el | 10 +++++----- lisp/progmodes/cc-awk.el | 2 +- lisp/progmodes/cperl-mode.el | 2 +- lisp/xml.el | 1 - 7 files changed, 10 insertions(+), 11 deletions(-) diff --git a/lisp/arc-mode.el b/lisp/arc-mode.el index 6a58d61a54..1c88f9a1a1 100644 --- a/lisp/arc-mode.el +++ b/lisp/arc-mode.el @@ -2019,7 +2019,7 @@ archive-rar-summarize (re-search-forward "^\\(\s+=3D+\s*\\)+\n") (while (looking-at (concat "^\s+[0-9.]+\s+D?-+\s+" ; Flags "\\([0-9-]+\\)\s+" ; Size - "\\([-0-9.%]+\\|-+\\)\s+" ; Ratio + "\\([-0-9.%]+\\)\s+" ; Ratio "\\([0-9a-zA-Z]+\\)\s+" ; Mode "\\([0-9-]+\\)\s+" ; Date "\\([0-9:]+\\)\s+" ; Time diff --git a/lisp/eshell/em-hist.el b/lisp/eshell/em-hist.el index 614faaa131..adb028002b 100644 --- a/lisp/eshell/em-hist.el +++ b/lisp/eshell/em-hist.el @@ -153,7 +153,7 @@ eshell-hist-event-designator :group 'eshell-hist) =20 (defcustom eshell-hist-word-designator - "^:?\\([0-9]+\\|[$^%*]\\)?\\(\\*\\|-[0-9]*\\|[$^%*]\\)?" + "^:?\\([0-9]+\\|[$^%*]\\)?\\(-[0-9]*\\|[$^%*]\\)?" "The regexp used to identify history word designators." :type 'regexp :group 'eshell-hist) diff --git a/lisp/info.el b/lisp/info.el index f3b413a2f9..2e5f433dc8 100644 --- a/lisp/info.el +++ b/lisp/info.el @@ -1531,7 +1531,7 @@ Info-dir-remove-duplicates (save-restriction (narrow-to-region start (point)) (goto-char (point-min)) - (while (re-search-forward "^\\* \\([^:\n]+:\\(:\\|[^.\n]+\\).\\)"= nil 'move) + (while (re-search-forward "^\\* \\([^:\n]+:[^.\n]+.\\)" nil 'move= ) ;; Fold case straight away; `member-ignore-case' here wasteful. (let ((x (downcase (match-string 1)))) (if (member x seen) diff --git a/lisp/international/ja-dic-cnv.el b/lisp/international/ja-dic= -cnv.el index 578cd63a59..e721083189 100644 --- a/lisp/international/ja-dic-cnv.el +++ b/lisp/international/ja-dic-cnv.el @@ -124,7 +124,7 @@ skkdic-convert-postfix (setq l (cdr l))))) =20 ;; Search postfix entries. - (while (re-search-forward "^[#<>?]\\(\\(\\cH\\|=E3=83=BC\\)+\\) " nil = t) + (while (re-search-forward "^[#<>?]\\(\\cH+\\) " nil t) (let ((kana (match-string-no-properties 1)) str candidates) (while (looking-at "/[#0-9 ]*\\([^/\n]*\\)/") @@ -157,7 +157,7 @@ skkdic-convert-prefix (insert ";; Setting prefix entries.\n" "(skkdic-set-prefix\n")) (save-excursion - (while (re-search-forward "^\\(\\(\\cH\\|=E3=83=BC\\)+\\)[<>?] " nil= t) + (while (re-search-forward "^\\(\\cH+\\)[<>?] " nil t) (let ((kana (match-string-no-properties 1)) str candidates) (while (looking-at "/\\([^/\n]+\\)/") @@ -275,11 +275,11 @@ skkdic-collect-okuri-nasi (let ((progress (make-progress-reporter "Collecting OKURI-NASI entri= es" (point) (point-max) nil 10))) - (while (re-search-forward "^\\(\\(\\cH\\|=E3=83=BC\\)+\\) \\(/\\cj= .*\\)/$" + (while (re-search-forward "^\\(\\cH+\\) \\(/\\cj.*\\)/$" nil t) (let ((kana (match-string-no-properties 1)) - (candidates (skkdic-get-candidate-list (match-beginning 3) - (match-end 3)))) + (candidates (skkdic-get-candidate-list (match-beginning 2) + (match-end 2)))) (setq skkdic-okuri-nasi-entries (cons (cons kana candidates) skkdic-okuri-nasi-entries)) (progress-reporter-update progress (point)) diff --git a/lisp/progmodes/cc-awk.el b/lisp/progmodes/cc-awk.el index 70aa3c4b1f..1a67a95927 100644 --- a/lisp/progmodes/cc-awk.el +++ b/lisp/progmodes/cc-awk.el @@ -95,7 +95,7 @@ awk-mode-syntax-table ;; Emacs has in the past used \r to mark hidden lines in some fashion (a= nd ;; maybe still does). =20 -(defconst c-awk-esc-pair-re "\\\\\\(.\\|\n\\|\r\\|\\'\\)") +(defconst c-awk-esc-pair-re "\\\\\\(.\\|\n\\|\\'\\)") ;; Matches any escaped (with \) character-pair, including an escaped n= ewline. (defconst c-awk-non-eol-esc-pair-re "\\\\\\(.\\|\\'\\)") ;; Matches any escaped (with \) character-pair, apart from an escaped = newline. diff --git a/lisp/progmodes/cperl-mode.el b/lisp/progmodes/cperl-mode.el index 73b55e29a5..ba007d67c0 100644 --- a/lisp/progmodes/cperl-mode.el +++ b/lisp/progmodes/cperl-mode.el @@ -7983,7 +7983,7 @@ cperl-beautify-regexp-piece "\\|" ; $ ^ "[$^]" "\\|" ; simple-code simple-code*? - "\\(\\\\.\\|[^][()#|*+?\n]\\)\\([*+{?]\\??\\)?" ; 4 5 + "\\(\\\\.\\|[^][()#|*+?$^\n]\\)\\([*+{?]\\??\\)?" ; 4 5 "\\|" ; Class "\\(\\[\\)" ; 6 "\\|" ; Grouping diff --git a/lisp/xml.el b/lisp/xml.el index b5b923f863..1f3c05f4d9 100644 --- a/lisp/xml.el +++ b/lisp/xml.el @@ -245,7 +245,6 @@ xml-enumerated-type-re ;; [54] AttType ::=3D StringType | TokenizedType | EnumeratedType ;; [55] StringType ::=3D 'CDATA' (defconst xml-att-type-re (concat "\\(?:CDATA\\|" xml-tokenized-type-re - "\\|" xml-notation-type-re "\\|" xml-enumerated-type-re "\\)")) =20 ;; [60] DefaultDecl ::=3D '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttV= alue) --=20 2.20.1 --------------43E7B7ACB7295C8FBC242E31--