unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Scan of regexps in Emacs (March 17)
@ 2019-03-17 13:50 Mattias Engdegård
  2019-03-19  1:21 ` Paul Eggert
  0 siblings, 1 reply; 31+ messages in thread
From: Mattias Engdegård @ 2019-03-17 13:50 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 549 bytes --]

This is a scan of mistakes in regexp strings in the Emacs source tree.

New this time, in addition to general improvements to the regexp finder and partial evaluator, is support for parsing and linting arguments to `skip-char-forward' and `skip-char-backward'. This turned out to be quite fruitful, since the syntax of skip-set strings is widely misunderstood.

The scan was made using trawl 1.3 (https://github.com/mattiase/trawl) and xr 1.8 (ELPA).

Please help fixing these. In particular, someone who knows Hebrew would be of assistance.

[-- Attachment #2: trawl.log --]
[-- Type: application/octet-stream, Size: 11983 bytes --]

;; -*- mode: compilation -*-
lisp/cedet/data-debug.el:923:23: In call to skip-chars-forward: Character `<' included in range `*->' (pos 4)
  " *-><[]"
   ....^
lisp/cedet/data-debug.el:930:23: In call to skip-chars-forward: Character `<' included in range `*->' (pos 4)
  " *-><[]"
   ....^
lisp/cedet/data-debug.el:1017:23: In call to skip-chars-forward: Character `<' included in range `*->' (pos 4)
  " *-><[]"
   ....^
lisp/emacs-lisp/checkdoc.el:1514:28: In call to skip-chars-forward: Unnecessarily escaped `*' (pos 1)
  "\"\\*"
   ..^
lisp/emulation/viper-ex.el:568:39: In call to skip-chars-backward: Suspect skip set framed in `[...]' (pos 0)
  "[a-zA-Z!=>&~]"
   ^
lisp/gnus/gnus-cite.el:1131:40: In call to split-string: Duplicated ` ' inside character alternative (pos 3)
  "[ \t [:alnum:]]+"
   ....^
lisp/gnus/nnir.el:1188:23: In call to skip-chars-forward: Suspect skip set framed in `[...]' (pos 0)
  "[[:blank:]]"
   ^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ת' inside character alternative (pos 23)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   .......................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ק' inside character alternative (pos 24)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ........................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ר' inside character alternative (pos 26)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ..........................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ת' inside character alternative (pos 27)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ...........................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ש' inside character alternative (pos 28)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ............................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ת' inside character alternative (pos 30)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ..............................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ת' inside character alternative (pos 32)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ................................^
lisp/mail/mail-extr.el:324:3: In mail-extr-bad-dot-pattern: Reversed range ``--' matches nothing (pos 19)
  "\\([][[:alnum:]{|}'~`---][[:alpha:]`'.]\\)\\.+\\([[:alpha:]]\\)"
   ....................^
lisp/mail/mail-extr.el:342:3: In mail-extr-full-name-suffix-pattern: Reversed range ``--' matches nothing (pos 53)
  "\\(,? ?\\([JjSs][Rr]\\.?\\|V?I+V?\\)\\)\\([^][[:alnum:]{|}'~`---]\\([^][[:alnum:]{|}'~`---]\\|\\'\\)\\|\\'\\)"
   ............................................................^
lisp/mail/mail-extr.el:342:3: In mail-extr-full-name-suffix-pattern: Reversed range ``--' matches nothing (pos 78)
  "\\(,? ?\\([JjSs][Rr]\\.?\\|V?I+V?\\)\\)\\([^][[:alnum:]{|}'~`---]\\([^][[:alnum:]{|}'~`---]\\|\\'\\)\\|\\'\\)"
   ......................................................................................^
lisp/mail/mail-extr.el:379:3: In mail-extr-name-pattern: Reversed range ``--' matches nothing (pos 30)
  "\\b[[:alpha:]][][[:alnum:]{|}'~`---]*[[:alpha:]`'.]"
   ...............................^
lisp/mail/mail-extr.el:431:3: In mail-extr-two-name-pattern: Reversed range ``--' matches nothing (pos 114)
  "\\`\\(\\b[[:alpha:]][][[:alnum:]{|}'~`]+[[:alpha:]`'.]\\|\\b[[:alpha:]]\\([. ]\\|\\b\\)\\) +\\(\\b[[:alpha:]][][[:alnum:]{|}'~`---]*[[:alpha:]`'.]\\)\\(,\\|\\'\\)"
   ..............................................................................................................................^
lisp/nxml/rng-nxml.el:163:48: In call to skip-chars-forward: Suspect skip set framed in `[...]' (pos 0)
  "[[:alnum:]_.-:]"
   ^
lisp/nxml/rng-nxml.el:210:53: In call to skip-chars-forward: Suspect skip set framed in `[...]' (pos 0)
  "[[:alnum:]_.-:]"
   ^
lisp/nxml/rng-nxml.el:250:53: In call to skip-chars-forward: Suspect skip set framed in `[...]' (pos 0)
  "[[:alnum:]_.-:]"
   ^
lisp/org/org-datetree.el:81:8: In call to org-datetree--find-create: Duplicated `%' inside character alternative (pos 56)
  "^\\*+[ \t]+\\([12][0-9]\\{3\\}\\)\\(\\s-*?\\([ \t]:[[:alnum:]:_@#%%]+:\\)?\\s-*$\\)"
   ..................................................................^
lisp/org/org-datetree.el:128:8: In call to org-datetree--find-create: Duplicated `%' inside character alternative (pos 56)
  "^\\*+[ \t]+\\([12][0-9]\\{3\\}\\)\\(\\s-*?\\([ \t]:[[:alnum:]:_@#%%]+:\\)?\\s-*$\\)"
   ..................................................................^
lisp/org/org-pcomplete.el:85:35: In call to skip-chars-backward: Suspect skip set framed in `[...]' (pos 0)
  "[ \t\n]"
   ^
lisp/org/org.el:14956:49: In call to skip-chars-forward: Unnecessarily escaped `*' (pos 0)
  "\\*"
   ^
lisp/org/org.el:20618:36: In call to skip-chars-forward: Duplicated character `^' (pos 10)
  "-+^/*0-9eE^."
   ..........^
lisp/org/org.el:22840:26: In call to skip-chars-backward: Suspect skip set framed in `[...]' (pos 0)
  "[ \t]"
   ^
lisp/progmodes/ada-mode.el:770:24: In call to skip-chars-backward: Stray `\' at end of string (pos 14)
  "-a-zA-Z0-9_:./\\"
   ..............^
lisp/progmodes/cperl-mode.el:7251:3: In cperl-not-bad-style-regexp: Duplicated `\' inside character alternative (pos 170)
  "[^-\t <>=+]\\(--\\|\\+\\+\\)\\|[a-zA-Z0-9_][|&][a-zA-Z0-9_$]\\|&[(a-zA-Z0-9_$]\\|<\\$?\\sw+\\(\\.\\(\\sw\\|_\\)+\\)?>\\|-[a-zA-Z][ \t]+[_$\"'`a-zA-Z]\\|-[0-9]\\|\\+\\+\\|--\\|.->\\|->\\|\\[-\\|\\\\[&$@*\\\\]\\|^=\\|\\$.\\|<<[a-zA-Z_'\"`]\\|||\\|//\\|&&\\|[CBIXSLFZ]<\\(\\sw\\|\\s \\|\\s_\\|[\n]\\)*>\\|-[a-zA-Z_0-9]+[ \t]*=>"
   ............................................................................................................................................................................................................^
lisp/progmodes/cperl-mode.el:8177:36: In call to skip-chars-backward: Stray `\' at end of string (pos 0)
  "\\"
   ^
lisp/progmodes/flymake-proc.el:1136:34: In call to flymake-proc--init-create-temp-source-and-master-buffer-copy: Escaped non-special character `i' (pos 5)
  "[ \t]*\\in\\(?:put\\|clude\\)[ \t]*{\\(.*%s\\)}"
   ......^
lisp/progmodes/flymake.el:423:47: In call to skip-chars-backward: Duplicated character `\t' (pos 3)
  " \t\f\t\n"
   .....^
lisp/progmodes/fortran.el:1798:32: In call to skip-chars-forward: Suspect skip set framed in `[...]' (pos 0)
  "[ \t0-9]"
   ^
lisp/progmodes/idlw-complete-structtag.el:131:32: In call to skip-chars-backward: Suspect skip set framed in `[...]' (pos 0)
  "[a-zA-Z0-9._$]"
   ^
lisp/progmodes/idlwave.el:7593:35: In call to skip-chars-backward: Suspect skip set framed in `[...]' (pos 0)
  "[a-zA-Z0-9_$]"
   ^
lisp/progmodes/prolog.el:2829:23: In call to looking-at: Repetition of repetition (pos 54)
  ":-[ \t]*\\(pred\\|mode\\)[ \t]+\\([[:lower:]$][[:alnum:]_$]*+\\)"
   ............................................................^
lisp/progmodes/prolog.el:2953:39: In call to skip-chars-forward: Unnecessarily escaped `.' (pos 3)
  "^ (\\."
   ...^
lisp/progmodes/ruby-mode.el:1493:42: In call to skip-chars-forward: Unnecessarily escaped `+' (pos 11)
  ",.:;|&^~=!?\\+\\-\\*"
   ...........^
lisp/progmodes/ruby-mode.el:1493:42: In call to skip-chars-forward: Unnecessarily escaped `*' (pos 15)
  ",.:;|&^~=!?\\+\\-\\*"
   .................^
lisp/progmodes/ruby-mode.el:1536:34: In call to skip-chars-backward: Unnecessarily escaped `+' (pos 14)
  " \t\n,.:;|&^~=!?\\+\\-\\*"
   ................^
lisp/progmodes/ruby-mode.el:1536:34: In call to skip-chars-backward: Unnecessarily escaped `*' (pos 18)
  " \t\n,.:;|&^~=!?\\+\\-\\*"
   ......................^
lisp/progmodes/ruby-mode.el:1549:67: In call to skip-chars-backward: Stray `\' at end of string (pos 0)
  "\\"
   ^
lisp/progmodes/verilog-mode.el:10497:44: In call to skip-chars-backward: Suspect skip set framed in `[...]' (pos 0)
  "[a-zA-Z0-9_`]"
   ^
lisp/textmodes/bib-mode.el:206:4: In bib-capitalize-title-stop-regexp: Duplicated alternative branch (pos 27)
  "\\(the\\|and\\|of\\|is\\|a\\|an\\|of\\|for\\|in\\|to\\|in\\|on\\|at\\|by\\|with\\|that\\|its\\)\\(\\b\\|'\\)"
   ..................................^
lisp/textmodes/bib-mode.el:206:4: In bib-capitalize-title-stop-regexp: Duplicated alternative branch (pos 44)
  "\\(the\\|and\\|of\\|is\\|a\\|an\\|of\\|for\\|in\\|to\\|in\\|on\\|at\\|by\\|with\\|that\\|its\\)\\(\\b\\|'\\)"
   .......................................................^
lisp/textmodes/refbib.el:145:4: In r2b-capitalize-title-stop-regexp: Duplicated alternative branch (pos 27)
  "\\(the\\|and\\|of\\|is\\|a\\|an\\|of\\|for\\|in\\|to\\|in\\|on\\|at\\|by\\|with\\|that\\|its\\)\\(\\b\\|'\\)"
   ..................................^
lisp/textmodes/refbib.el:145:4: In r2b-capitalize-title-stop-regexp: Duplicated alternative branch (pos 44)
  "\\(the\\|and\\|of\\|is\\|a\\|an\\|of\\|for\\|in\\|to\\|in\\|on\\|at\\|by\\|with\\|that\\|its\\)\\(\\b\\|'\\)"
   .......................................................^
lisp/textmodes/refbib.el:191:4: In r2b-stop-regexp: Duplicated alternative branch (pos 43)
  "\\`\\(\\(Some\\|What\\|the\\|and\\|of\\|is\\|a\\|an\\|of\\|for\\|in\\|to\\|in\\|on\\|at\\|by\\|with\\|that\\|its\\)\\('\\w*\\)?\\W+\\)*\\([A-Z0-9]+\\)"
   ......................................................^
lisp/textmodes/refbib.el:191:4: In r2b-stop-regexp: Duplicated alternative branch (pos 60)
  "\\`\\(\\(Some\\|What\\|the\\|and\\|of\\|is\\|a\\|an\\|of\\|for\\|in\\|to\\|in\\|on\\|at\\|by\\|with\\|that\\|its\\)\\('\\w*\\)?\\W+\\)*\\([A-Z0-9]+\\)"
   ...........................................................................^
lisp/textmodes/reftex-parse.el:1003:39: In call to skip-chars-forward: Unnecessarily escaped `[' (pos 1)
  "{\\["
   .^
lisp/textmodes/rst.el:228:50: In call to rst-extract-version: Repetition of repetition (pos 3)
  ".+?+"
   ...^
lisp/url/url-http.el:516:23: In call to skip-chars-forward: Duplicated character `T' (pos 2)
  "HTTP/"
   ..^
lisp/align.el:386:3: In align-rules-list (make-assignment): Duplicated `\' inside character alternative (pos 35)
  "^\\s-*\\w+\\(\\s-*\\):?=\\(\\s-*\\)\\([^\t\n \\\\]\\|$\\)"
   ...............................................^
lisp/auth-source-pass.el:137:31: In call to split-string: Escaped non-special character `\n' (pos 0)
  "\\\n"
   ^
lisp/auth-source-pass.el:142:39: In call to split-string: Escaped non-special character `\n' (pos 0)
  "\\\n"
   ^
lisp/auth-source-pass.el:142:48: In call to split-string: Escaped non-special character ` ' (pos 0)
  "\\ "
   ^
lisp/comint.el:1621:42: In call to comint-how-many-region: Duplicated `\' inside character alternative (pos 8)
  "\\(^\\|[^\\\\]\\)'"
   ...........^
lisp/comint.el:1622:42: In call to comint-how-many-region: Duplicated `\' inside character alternative (pos 8)
  "\\(^\\|[^\\\\]\\)\""
   ...........^
lisp/comint.el:2084:34: In call to string-match: Unescaped literal `^' (pos 3)
  "\\(^^\\)\\1+"
   ....^
lisp/pcomplete.el:771:35: In call to skip-chars-forward: Stray `\' at end of string (pos 4)
  "^ \t\n\\"
   ......^
lisp/term.el:1987:40: In call to term-how-many-region: Duplicated `\' inside character alternative (pos 8)
  "\\(^\\|[^\\\\]\\)'"
   ...........^
lisp/term.el:1988:40: In call to term-how-many-region: Duplicated `\' inside character alternative (pos 8)
  "\\(^\\|[^\\\\]\\)\""
   ...........^
test/lisp/progmodes/f90-tests.el:101:25: In call to skip-chars-forward: Suspect skip set framed in `[...]' (pos 0)
  "[ \t]"
   ^

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-17 13:50 Scan of regexps in Emacs (March 17) Mattias Engdegård
@ 2019-03-19  1:21 ` Paul Eggert
  2019-03-19 10:34   ` Mattias Engdegård
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Eggert @ 2019-03-19  1:21 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

Thanks for doing that. I fixed all the regular expressions I could
easily fix, by installing the attached patch into master.

I saw two false alarms, both in lisp/org/org-datetree.el. Both are of
the form (re-search-forward (format "[chars%%]+" ...) nil t), in which
both '%' characters are needed. Perhaps you could tweak the trawler to
not report these?

The remaining unfixed issues are all in lisp/mail/footnote.el's
footnote-hebrew-numeric-regex, where Eli's expertise would help. I'm
attaching these unfixed issues in mail-extr.txt.

I assume this was a complete trawl, so that I can ignore the earlier
scans you emailed (I got behind in looking into them). If not, please
let me know.



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-more-regular-expression-typos.patch --]
[-- Type: text/x-patch; name="0001-Fix-more-regular-expression-typos.patch", Size: 31675 bytes --]

From 8887f1ebff2eb2768dafe24a905ff7d08cbf96da Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 18 Mar 2019 17:02:01 -0700
Subject: [PATCH] Fix more regular expression typos
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problem reported by Mattias Engdegård in:
https://lists.gnu.org/r/emacs-devel/2019-03/msg00548.html
except that I didn’t address the issues involving Hebrew,
or involving comint-prompt-regexp.
* lisp/align.el (align-rules-list, align-exclude-rules-list):
* lisp/auth-source-pass.el (auth-source-pass--parse-secret)
(auth-source-pass--parse-data):
* lisp/cedet/data-debug.el (data-debug-next)
(data-debug-prev, data-debug-expand-or-contract):
* lisp/comint.el (comint-within-quotes):
* lisp/emacs-lisp/checkdoc.el (checkdoc-this-string-valid-engine):
* lisp/emulation/viper-ex.el (ex-cmd-complete):
* lisp/gnus/gnus-cite.el (gnus-message-search-citation-line):
* lisp/gnus/nnir.el (nnir-imap-end-of-input):
* lisp/mail/mail-extr.el (mail-extr-all-letters):
* lisp/minibuffer.el (minibuffer-maybe-quote-filename):
* lisp/nxml/rng-nxml.el (rng-complete-tag)
(rng-complete-end-tag, rng-complete-attribute-name):
* lisp/obsolete/vip.el (vip-get-ex-token, vip-get-ex-pat):
* lisp/org/org-pcomplete.el (org-thing-at-point):
* lisp/org/org.el (org-set-tags)
(org-increase-number-at-point)
(org-fill-line-break-nobreak-p):
* lisp/pcomplete.el (pcomplete-parse-comint-arguments):
* lisp/progmodes/ada-mode.el (ada-compile-goto-error):
* lisp/progmodes/cperl-mode.el (cperl-highlight-charclass)
(cperl-find-pods-heres, cperl-not-bad-style-regexp)
(cperl-regext-to-level-start):
* lisp/progmodes/ebnf-yac.el (ebnf-yac-skip-spaces):
* lisp/progmodes/flymake-proc.el (flymake-proc-master-tex-init):
* lisp/progmodes/flymake.el (flymake-diag-region):
* lisp/progmodes/fortran.el (fortran-current-line-indentation):
* lisp/progmodes/idlw-complete-structtag.el:
(idlwave-complete-structure-tag):
* lisp/progmodes/idlwave.el (idlwave-complete-sysvar-or-tag):
* lisp/progmodes/prolog.el (prolog-pred-end)
(prolog-clause-info):
* lisp/progmodes/ruby-mode.el (ruby-forward-sexp)
(ruby-backward-sexp):
* lisp/progmodes/verilog-mode.el (verilog-repair-open-comma):
* lisp/term.el (term-within-quotes):
* lisp/textmodes/bib-mode.el (bib-capitalize-title-stop-words):
* lisp/textmodes/refbib.el (r2b-capitalize-title-stop-words):
* lisp/textmodes/reftex-parse.el (reftex-nth-arg):
* lisp/textmodes/rst.el (rst-svn-rev):
* lisp/url/url-http.el (url-http-parse-response):
* test/lisp/progmodes/f90-tests.el (f90-test-bug3730):
Fix regular expression typos.
---
 lisp/align.el                             |  4 ++--
 lisp/auth-source-pass.el                  |  4 ++--
 lisp/cedet/data-debug.el                  |  6 +++---
 lisp/comint.el                            |  4 ++--
 lisp/emacs-lisp/checkdoc.el               |  2 +-
 lisp/emulation/viper-ex.el                |  2 +-
 lisp/gnus/gnus-cite.el                    |  2 +-
 lisp/gnus/nnir.el                         |  2 +-
 lisp/mail/mail-extr.el                    | 10 +---------
 lisp/minibuffer.el                        |  2 +-
 lisp/nxml/rng-nxml.el                     |  6 +++---
 lisp/obsolete/vip.el                      |  6 +++---
 lisp/org/org-pcomplete.el                 |  2 +-
 lisp/org/org.el                           |  6 +++---
 lisp/pcomplete.el                         |  2 +-
 lisp/progmodes/ada-mode.el                |  2 +-
 lisp/progmodes/cperl-mode.el              | 18 +++++++++---------
 lisp/progmodes/ebnf-yac.el                |  2 +-
 lisp/progmodes/flymake-proc.el            |  2 +-
 lisp/progmodes/flymake.el                 |  2 +-
 lisp/progmodes/fortran.el                 |  2 +-
 lisp/progmodes/idlw-complete-structtag.el |  2 +-
 lisp/progmodes/idlwave.el                 |  2 +-
 lisp/progmodes/prolog.el                  |  4 ++--
 lisp/progmodes/ruby-mode.el               |  6 +++---
 lisp/progmodes/verilog-mode.el            |  2 +-
 lisp/term.el                              |  4 ++--
 lisp/textmodes/bib-mode.el                |  2 +-
 lisp/textmodes/refbib.el                  |  2 +-
 lisp/textmodes/reftex-parse.el            |  2 +-
 lisp/textmodes/rst.el                     |  2 +-
 lisp/url/url-http.el                      |  2 +-
 test/lisp/progmodes/f90-tests.el          |  2 +-
 33 files changed, 56 insertions(+), 64 deletions(-)

diff --git a/lisp/align.el b/lisp/align.el
index 594d15eee1..a81498be5d 100644
--- a/lisp/align.el
+++ b/lisp/align.el
@@ -452,7 +452,7 @@ align-rules-list
      (tab-stop . nil))
 
     (make-assignment
-     (regexp   . "^\\s-*\\w+\\(\\s-*\\):?=\\(\\s-*\\)\\([^\t\n \\\\]\\|$\\)")
+     (regexp   . "^\\s-*\\w+\\(\\s-*\\):?=\\(\\s-*\\)\\([^\t\n \\]\\|$\\)")
      (group    . (1 2))
      (modes    . '(makefile-mode))
      (tab-stop . nil))
@@ -759,7 +759,7 @@ align-exclude-rules-list
 	  (lambda (end reverse)
 	    (funcall (if reverse 're-search-backward
 		       're-search-forward)
-		     (concat "[^ \t\n\\\\]"
+		     (concat "[^ \t\n\\]"
 			     (regexp-quote comment-start)
 			     "\\(.+\\)$") end t))))
      (modes  . align-open-comment-modes))
diff --git a/lisp/auth-source-pass.el b/lisp/auth-source-pass.el
index deb805a6e1..29ff9c6685 100644
--- a/lisp/auth-source-pass.el
+++ b/lisp/auth-source-pass.el
@@ -134,12 +134,12 @@ auth-source-pass-parse-entry
 (defun auth-source-pass--parse-secret (contents)
   "Parse the password-store data in the string CONTENTS and return its secret.
 The secret is the first line of CONTENTS."
-  (car (split-string contents "\\\n" t)))
+  (car (split-string contents "\n" t)))
 
 (defun auth-source-pass--parse-data (contents)
   "Parse the password-store data in the string CONTENTS and return an alist.
 CONTENTS is the contents of a password-store formatted file."
-  (let ((lines (split-string contents "\\\n" t "\\\s")))
+  (let ((lines (split-string contents "\n" t "\\\s")))
     (seq-remove #'null
                 (mapcar (lambda (line)
                           (let ((pair (mapcar (lambda (s) (string-trim s))
diff --git a/lisp/cedet/data-debug.el b/lisp/cedet/data-debug.el
index ba312433d3..eeec6b5834 100644
--- a/lisp/cedet/data-debug.el
+++ b/lisp/cedet/data-debug.el
@@ -920,14 +920,14 @@ data-debug-next
   (interactive)
   (forward-line 1)
   (beginning-of-line)
-  (skip-chars-forward " *-><[]" (point-at-eol)))
+  (skip-chars-forward "- *><[]" (point-at-eol)))
 
 (defun data-debug-prev ()
   "Go to the previous line in the Ddebug buffer."
   (interactive)
   (forward-line -1)
   (beginning-of-line)
-  (skip-chars-forward " *-><[]" (point-at-eol)))
+  (skip-chars-forward "- *><[]" (point-at-eol)))
 
 (defun data-debug-next-expando ()
   "Go to the next line in the Ddebug buffer.
@@ -1014,7 +1014,7 @@ data-debug-expand-or-contract
 	   (data-debug-current-line-expanded-p))
       (data-debug-contract-current-line)
     (data-debug-expand-current-line))
-  (skip-chars-forward " *-><[]" (point-at-eol)))
+  (skip-chars-forward "- *><[]" (point-at-eol)))
 
 (defun data-debug-expand-or-contract-mouse (event)
   "Expand or contract anything at event EVENT."
diff --git a/lisp/comint.el b/lisp/comint.el
index a5fca7ea2a..a71821baa5 100644
--- a/lisp/comint.el
+++ b/lisp/comint.el
@@ -1618,8 +1618,8 @@ comint-history-isearch-pop-state
 (defun comint-within-quotes (beg end)
   "Return t if the number of quotes between BEG and END is odd.
 Quotes are single and double."
-  (let ((countsq (comint-how-many-region "\\(^\\|[^\\\\]\\)'" beg end))
-	(countdq (comint-how-many-region "\\(^\\|[^\\\\]\\)\"" beg end)))
+  (let ((countsq (comint-how-many-region "\\(^\\|[^\\]\\)'" beg end))
+	(countdq (comint-how-many-region "\\(^\\|[^\\]\\)\"" beg end)))
     (or (= (mod countsq 2) 1) (= (mod countdq 2) 1))))
 
 (defun comint-how-many-region (regexp beg end)
diff --git a/lisp/emacs-lisp/checkdoc.el b/lisp/emacs-lisp/checkdoc.el
index dca2f16956..fa6f85c588 100644
--- a/lisp/emacs-lisp/checkdoc.el
+++ b/lisp/emacs-lisp/checkdoc.el
@@ -1511,7 +1511,7 @@ checkdoc-this-string-valid-engine
 					      (line-end-position))))))))
      ;; Continuation of above.  Make sure our sentence is capitalized.
      (save-excursion
-       (skip-chars-forward "\"\\*")
+       (skip-chars-forward "\"*")
        (if (looking-at "[a-z]")
 	   (if (checkdoc-autofix-ask-replace
 		(match-beginning 0) (match-end 0)
diff --git a/lisp/emulation/viper-ex.el b/lisp/emulation/viper-ex.el
index f5090573c8..45b91cd9c0 100644
--- a/lisp/emulation/viper-ex.el
+++ b/lisp/emulation/viper-ex.el
@@ -565,7 +565,7 @@ ex-cmd-complete
   (let (save-pos dist compl-list string-to-complete completion-result)
 
     (save-excursion
-      (setq dist (skip-chars-backward "[a-zA-Z!=>&~]")
+      (setq dist (skip-chars-backward "a-zA-Z!=>&~")
 	    save-pos (point)))
 
     (if (or (= dist 0)
diff --git a/lisp/gnus/gnus-cite.el b/lisp/gnus/gnus-cite.el
index 2216d4a042..7e431e79fc 100644
--- a/lisp/gnus/gnus-cite.el
+++ b/lisp/gnus/gnus-cite.el
@@ -1128,7 +1128,7 @@ gnus-message-search-citation-line
     (let ((cdepth (min (length (apply 'concat
 				      (split-string
 				       (match-string-no-properties 0)
-				       "[ \t [:alnum:]]+")))
+				       "[\t [:alnum:]]+")))
 		       gnus-message-max-citation-depth))
 	  (mlist (make-list (* (1+ gnus-message-max-citation-depth) 2) nil))
 	  (start (point-at-bol))
diff --git a/lisp/gnus/nnir.el b/lisp/gnus/nnir.el
index 7c94abde0d..37a38a58d4 100644
--- a/lisp/gnus/nnir.el
+++ b/lisp/gnus/nnir.el
@@ -1185,7 +1185,7 @@ nnir-imap-delimited-string
 
 (defun nnir-imap-end-of-input ()
   "Are we at the end of input?"
-  (skip-chars-forward "[[:blank:]]")
+  (skip-chars-forward "[:blank:]")
   (looking-at "$"))
 
 
diff --git a/lisp/mail/mail-extr.el b/lisp/mail/mail-extr.el
index ae849d7b62..cb57d8ea01 100644
--- a/lisp/mail/mail-extr.el
+++ b/lisp/mail/mail-extr.el
@@ -293,7 +293,7 @@ mail-extr-all-letters-but-separators
 ;; multipart names.
 ;; #### should . be in here?
 (defconst mail-extr-all-letters
-  (purecopy (concat mail-extr-all-letters-but-separators "---")))
+  (purecopy (concat mail-extr-all-letters-but-separators "-")))
 
 ;; Any character that can start a name.
 ;; Keep this set as minimal as possible.
@@ -305,19 +305,11 @@ mail-extr-last-letters
 
 (defconst mail-extr-leading-garbage "\\W+")
 
-;; (defconst mail-extr-non-name-chars
-;;   (purecopy (concat "^" mail-extr-all-letters ".")))
 ;; (defconst mail-extr-non-begin-name-chars
 ;;   (purecopy (concat "^" mail-extr-first-letters)))
 ;; (defconst mail-extr-non-end-name-chars
 ;;   (purecopy (concat "^" mail-extr-last-letters)))
 
-;; Matches an initial not followed by both a period and a space.
-;; (defconst mail-extr-bad-initials-pattern
-;;   (purecopy
-;;    (format "\\(\\([^%s]\\|\\`\\)[%s]\\)\\(\\.\\([^ ]\\)\\| \\|\\([^%s .]\\)\\|\\'\\)"
-;;            mail-extr-all-letters mail-extr-first-letters mail-extr-all-letters)))
-
 ;; Matches periods used instead of spaces.  Must not match the period
 ;; following an initial.
 (defconst mail-extr-bad-dot-pattern
diff --git a/lisp/minibuffer.el b/lisp/minibuffer.el
index c5d714846d..df0acbb343 100644
--- a/lisp/minibuffer.el
+++ b/lisp/minibuffer.el
@@ -2297,7 +2297,7 @@ minibuffer-maybe-quote-filename
   (if (and (not (file-name-quoted-p filename))
            (file-name-absolute-p filename)
            (string-match-p (if (memq system-type '(windows-nt ms-dos))
-                               "[/\\\\]~" "/~")
+                               "[/\\]~" "/~")
                            (file-local-name filename)))
       (file-name-quote filename)
     (minibuffer--double-dollars filename)))
diff --git a/lisp/nxml/rng-nxml.el b/lisp/nxml/rng-nxml.el
index c110937b34..05b59316d1 100644
--- a/lisp/nxml/rng-nxml.el
+++ b/lisp/nxml/rng-nxml.el
@@ -160,7 +160,7 @@ rng-complete-tag
       (and rng-collecting-text (rng-flush-text))
       (let ((target-names (rng-match-possible-start-tag-names)))
         `(,(1+ lt-pos)
-          ,(save-excursion (skip-chars-forward "[[:alnum:]_.-:]") (point))
+          ,(save-excursion (skip-chars-forward "-[:alnum:]_.:") (point))
           ,(apply-partially #'rng-complete-qname-function
                             target-names nil extra-strings)
           :exit-function
@@ -207,7 +207,7 @@ rng-complete-end-tag
 			      (cdar rng-open-elements))
 		    (cdar rng-open-elements))))
              `(,(+ (match-beginning 0) 2)
-               ,(save-excursion (skip-chars-forward "[[:alnum:]_.-:]") (point))
+               ,(save-excursion (skip-chars-forward "-[:alnum:]_.:") (point))
                ,(list start-tag-name)   ;Sole completion candidate.
                :exit-function
                ,(lambda (_completion status)
@@ -247,7 +247,7 @@ rng-complete-attribute-name
 			      "xmlns"))
 			  rng-undeclared-prefixes)))
              `(,attribute-start
-               ,(save-excursion (skip-chars-forward "[[:alnum:]_.-:]") (point))
+               ,(save-excursion (skip-chars-forward "-[:alnum:]_.:") (point))
                ,(apply-partially #'rng-complete-qname-function
                                  target-names t extra-strings)
                :exit-function
diff --git a/lisp/obsolete/vip.el b/lisp/obsolete/vip.el
index 0360db9bca..bc4b90031e 100644
--- a/lisp/obsolete/vip.el
+++ b/lisp/obsolete/vip.el
@@ -2216,7 +2216,7 @@ vip-get-ex-token
 	     (while (and (not (eolp)) cont)
 	       ;;(re-search-forward "[^/]*/")
 	       (re-search-forward "[^/]*\\(/\\|\n\\)")
-	       (if (not (vip-looking-back "[^\\\\]\\(\\\\\\\\\\)*\\\\/"))
+	       (if (not (vip-looking-back "[^\\]\\(\\\\\\\\\\)*\\\\/"))
 		   (setq cont nil))))
 	   (backward-char 1)
 	   (setq ex-token (buffer-substring (point) (mark)))
@@ -2229,7 +2229,7 @@ vip-get-ex-token
 	     (while (and (not (eolp)) cont)
 	       ;;(re-search-forward "[^\\?]*\\?")
 	       (re-search-forward "[^\\?]*\\(\\?\\|\n\\)")
-	       (if (not (vip-looking-back "[^\\\\]\\(\\\\\\\\\\)*\\\\\\?"))
+	       (if (not (vip-looking-back "[^\\]\\(\\\\\\\\\\)*\\\\\\?"))
 		   (setq cont nil))
 	       (backward-char 1)
 	       (if (not (looking-at "\n")) (forward-char 1))))
@@ -2325,7 +2325,7 @@ vip-get-ex-pat
 	    (while (and (not (eolp)) cont)
 	      (re-search-forward "[^/]*\\(/\\|\n\\)")
 	      ;;(re-search-forward "[^/]*/")
-	      (if (not (vip-looking-back "[^\\\\]\\(\\\\\\\\\\)*\\\\/"))
+	      (if (not (vip-looking-back "[^\\]\\(\\\\\\\\\\)*\\\\/"))
 		  (setq cont nil))))
 	  (setq ex-token
 		(if (= (mark) (point)) ""
diff --git a/lisp/org/org-pcomplete.el b/lisp/org/org-pcomplete.el
index b8a2f68759..49983c40a5 100644
--- a/lisp/org/org-pcomplete.el
+++ b/lisp/org/org-pcomplete.el
@@ -82,7 +82,7 @@ org-thing-at-point
 	   (not (equal (char-after (point-at-bol)) ?*))
 	   (save-excursion
 	     (move-beginning-of-line 1)
-	     (skip-chars-backward "[ \t\n]")
+	     (skip-chars-backward " \t\n")
 	     ;; org-drawer-regexp matches a whole line but while
 	     ;; looking-back, we just ignore trailing whitespaces
 	     (or (looking-back (substring org-drawer-regexp 0 -1)
diff --git a/lisp/org/org.el b/lisp/org/org.el
index e3c78ae90d..bf7e305b7a 100644
--- a/lisp/org/org.el
+++ b/lisp/org/org.el
@@ -14953,7 +14953,7 @@ org-set-tags
 	  (unless (equal tags "")
 	    (let* ((level (save-excursion
 			    (beginning-of-line)
-			    (skip-chars-forward "\\*")))
+			    (skip-chars-forward "*")))
 		   (offset (if (bound-and-true-p org-indent-mode)
 			       (* (1- org-indent-indentation-per-level)
 				  (1- level))
@@ -20615,7 +20615,7 @@ org-increase-number-at-point
     (unless inc (setq inc 1))
     (let ((pos (point))
 	  (beg (skip-chars-backward "-+^/*0-9eE."))
-	  (end (skip-chars-forward "-+^/*0-9eE^.")) nap)
+	  (end (skip-chars-forward "-+^/*0-9eE.")) nap)
       (setq nap (buffer-substring-no-properties
 		 (+ pos beg) (+ pos beg end)))
       (delete-region (+ pos beg) (+ pos beg end))
@@ -22837,7 +22837,7 @@ org-setup-filling
 (defun org-fill-line-break-nobreak-p ()
   "Non-nil when a new line at point would create an Org line break."
   (save-excursion
-    (skip-chars-backward "[ \t]")
+    (skip-chars-backward " \t")
     (skip-chars-backward "\\\\")
     (looking-at "\\\\\\\\\\($\\|[^\\]\\)")))
 
diff --git a/lisp/pcomplete.el b/lisp/pcomplete.el
index d4ee2c38fa..ef285db57e 100644
--- a/lisp/pcomplete.el
+++ b/lisp/pcomplete.el
@@ -768,7 +768,7 @@ pcomplete-parse-comint-arguments
 	(push (point) begins)
         (while
             (progn
-              (skip-chars-forward "^ \t\n\\")
+              (skip-chars-forward "^ \t\n")
               (when (eq (char-after) ?\\)
                 (forward-char 1)
                 (unless (eolp)
diff --git a/lisp/progmodes/ada-mode.el b/lisp/progmodes/ada-mode.el
index 59dc1d0fda..e01f1e8ecb 100644
--- a/lisp/progmodes/ada-mode.el
+++ b/lisp/progmodes/ada-mode.el
@@ -767,7 +767,7 @@ ada-compile-goto-error
   (interactive "d")
   (goto-char pos)
 
-  (skip-chars-backward "-a-zA-Z0-9_:./\\")
+  (skip-chars-backward "-a-zA-Z0-9_:./\\\\")
   (cond
    ;;  special case: looking at a filename:line not at the beginning of a line
    ;;  or a simple line reference "at line ..."
diff --git a/lisp/progmodes/cperl-mode.el b/lisp/progmodes/cperl-mode.el
index 970c5669c6..0b6008a511 100644
--- a/lisp/progmodes/cperl-mode.el
+++ b/lisp/progmodes/cperl-mode.el
@@ -3507,18 +3507,18 @@ cperl-look-at-leading-count
 (defsubst cperl-highlight-charclass (endbracket dashface bsface onec-space)
   (let ((l '(1 5 7)) ll lle lll
 	;; 2 groups, the first takes the whole match (include \[trnfabe])
-	(singleChar (concat "\\(" "[^\\\\]" "\\|" "\\\\[^cdg-mo-qsu-zA-Z0-9_]" "\\|" "\\\\c." "\\|" "\\\\x" "\\([0-9a-fA-F][0-9a-fA-F]?\\|\\={[0-9a-fA-F]+}\\)" "\\|" "\\\\0?[0-7][0-7]?[0-7]?" "\\|" "\\\\N{[^{}]*}" "\\)")))
+	(singleChar (concat "\\(" "[^\\]" "\\|" "\\\\[^cdg-mo-qsu-zA-Z0-9_]" "\\|" "\\\\c." "\\|" "\\\\x" "\\([0-9a-fA-F][0-9a-fA-F]?\\|\\={[0-9a-fA-F]+}\\)" "\\|" "\\\\0?[0-7][0-7]?[0-7]?" "\\|" "\\\\N{[^{}]*}" "\\)")))
     (while				; look for unescaped - between non-classes
 	(re-search-forward
 	 ;; On 19.33, certain simplifications lead
 	 ;; to bugs (as in  [^a-z] \\| [trnfabe]  )
 	 (concat	       		; 1: SingleChar (include \[trnfabe])
 	  singleChar
-	  ;;"\\(" "[^\\\\]" "\\|" "\\\\[^cdg-mo-qsu-zA-Z0-9_]" "\\|" "\\\\c." "\\|" "\\\\x" "\\([0-9a-fA-F][0-9a-fA-F]?\\|\\={[0-9a-fA-F]+}\\)" "\\|" "\\\\0?[0-7][0-7]?[0-7]?" "\\|" "\\\\N{[^{}]*}" "\\)"
+	  ;;"\\(" "[^\\]" "\\|" "\\\\[^cdg-mo-qsu-zA-Z0-9_]" "\\|" "\\\\c." "\\|" "\\\\x" "\\([0-9a-fA-F][0-9a-fA-F]?\\|\\={[0-9a-fA-F]+}\\)" "\\|" "\\\\0?[0-7][0-7]?[0-7]?" "\\|" "\\\\N{[^{}]*}" "\\)"
 	  "\\("				; 3: DASH SingleChar (match optionally)
 	    "\\(-\\)"			; 4: DASH
 	    singleChar			; 5: SingleChar
-	    ;;"\\(" "[^\\\\]" "\\|" "\\\\[^cdg-mo-qsu-zA-Z0-9_]" "\\|" "\\\\c." "\\|" "\\\\x" "\\([0-9a-fA-F][0-9a-fA-F]?\\|\\={[0-9a-fA-F]+}\\)" "\\|" "\\\\0?[0-7][0-7]?[0-7]?" "\\|" "\\\\N{[^{}]*}" "\\)"
+	    ;;"\\(" "[^\\]" "\\|" "\\\\[^cdg-mo-qsu-zA-Z0-9_]" "\\|" "\\\\c." "\\|" "\\\\x" "\\([0-9a-fA-F][0-9a-fA-F]?\\|\\={[0-9a-fA-F]+}\\)" "\\|" "\\\\0?[0-7][0-7]?[0-7]?" "\\|" "\\\\N{[^{}]*}" "\\)"
 	  "\\)?"
 	  "\\|"
 	  "\\("				; 7: other escapes
@@ -4455,13 +4455,13 @@ cperl-find-pods-heres
 			      ;; Apparently, I can't put \] into a charclass
 			      ;; in m]]: m][\\\]\]] produces [\\]]
 ;;;   POSIX?  [:word:] [:^word:] only inside []
-;;;	       "\\=\\(\\\\.\\|[^][\\\\]\\|\\[:\\^?\sw+:]\\|\\[[^:]\\)*]")
+;;;	       "\\=\\(\\\\.\\|[^][\\]\\|\\[:\\^?\sw+:]\\|\\[[^:]\\)*]")
 			      (while	; look for unescaped ]
 				  (and argument
 				       (re-search-forward
 					(if (eq (char-after b) ?\] )
-					    "\\=\\(\\\\[^]]\\|[^]\\\\]\\)*\\\\]"
-					  "\\=\\(\\\\.\\|[^]\\\\]\\)*]")
+					    "\\=\\(\\\\[^]]\\|[^]\\]\\)*\\\\]"
+					  "\\=\\(\\\\.\\|[^]\\]\\)*]")
 					(1- e) 'toend))
 				;; Is this ] an end of POSIX class?
 				(if (save-excursion
@@ -4580,7 +4580,7 @@ cperl-find-pods-heres
 			      ;; Works also if the outside delimiters are ().
 			      (or;;(if (eq (char-after b) ?\) )
 			       ;;(re-search-forward
-			       ;; "[^\\\\]\\(\\\\\\\\\\)*\\\\)"
+			       ;; "[^\\]\\(\\\\\\\\\\)*\\\\)"
 			       ;; (1- e) 'toend)
 			       (search-forward ")" (1- e) 'toend)
 			       ;;)
@@ -7261,7 +7261,7 @@ cperl-not-bad-style-regexp
      ".->"				; a->b
      "->"				; a SPACE ->b
      "\\[-"				; a[-1]
-     "\\\\[&$@*\\\\]"			; \&func
+     "\\\\[&$@*\\]"			; \&func
      "^="				; =head
      "\\$."				; $|
      "<<[a-zA-Z_'\"`]"			; <<FOO, <<'FOO'
@@ -8174,7 +8174,7 @@ cperl-regext-to-level-start
 	  (error "Cannot find `(' which starts a group"))
       (setq done
 	    (save-excursion
-	      (skip-chars-backward "\\")
+	      (skip-chars-backward "\\\\")
 	      (looking-at "\\(\\\\\\\\\\)*(")))
       (or done (forward-char -1)))))
 
diff --git a/lisp/progmodes/ebnf-yac.el b/lisp/progmodes/ebnf-yac.el
index 3365f647c7..9e712353e9 100644
--- a/lisp/progmodes/ebnf-yac.el
+++ b/lisp/progmodes/ebnf-yac.el
@@ -392,7 +392,7 @@ ebnf-yac-lex
 (defun ebnf-yac-skip-spaces ()
   (skip-chars-forward
    (if ebnf-yac-skip-char
-       "\n\r\t !#$&()*+-.0123456789=?@[\\\\]^_`~"
+       "-\n\r\t !#$&()*+,.0123456789=?@[\\\\]^_`~"
      "\n\r\t ")
    ebnf-limit)
   (< (point) ebnf-limit))
diff --git a/lisp/progmodes/flymake-proc.el b/lisp/progmodes/flymake-proc.el
index 2d9dd047a3..dbf7561944 100644
--- a/lisp/progmodes/flymake-proc.el
+++ b/lisp/progmodes/flymake-proc.el
@@ -1133,7 +1133,7 @@ flymake-proc-master-tex-init
   (let* ((temp-master-file-name (flymake-proc--init-create-temp-source-and-master-buffer-copy
                                  'flymake-proc-get-include-dirs-dot 'flymake-proc-create-temp-inplace
 				 '("\\.tex\\'")
-				 "[ \t]*\\in\\(?:put\\|clude\\)[ \t]*{\\(.*%s\\)}")))
+				 "[ \t]*in\\(?:put\\|clude\\)[ \t]*{\\(.*%s\\)}")))
     (when temp-master-file-name
       (flymake-proc--get-tex-args temp-master-file-name))))
 
diff --git a/lisp/progmodes/flymake.el b/lisp/progmodes/flymake.el
index 830d700963..d6cd370dac 100644
--- a/lisp/progmodes/flymake.el
+++ b/lisp/progmodes/flymake.el
@@ -420,7 +420,7 @@ flymake-diag-region
                        (beg)
                        (progn
                          (end-of-line)
-                         (skip-chars-backward " \t\f\t\n" beg)
+                         (skip-chars-backward " \t\f\n" beg)
                          (if (eq (point) beg)
                              (line-beginning-position 2)
                            (point)))))
diff --git a/lisp/progmodes/fortran.el b/lisp/progmodes/fortran.el
index b8aa521cf6..152667040f 100644
--- a/lisp/progmodes/fortran.el
+++ b/lisp/progmodes/fortran.el
@@ -1795,7 +1795,7 @@ fortran-current-line-indentation
            (goto-char (match-end 0)))
           (t
            ;; Move past line number.
-           (skip-chars-forward "[ \t0-9]")))
+           (skip-chars-forward " \t0-9")))
     ;; Move past whitespace.
     (skip-chars-forward " \t")
     (current-column)))
diff --git a/lisp/progmodes/idlw-complete-structtag.el b/lisp/progmodes/idlw-complete-structtag.el
index 8a50b9b537..51c9117cd4 100644
--- a/lisp/progmodes/idlw-complete-structtag.el
+++ b/lisp/progmodes/idlw-complete-structtag.el
@@ -128,7 +128,7 @@ idlwave-complete-structure-tag
           ;; x[i+4].name.g*.  But it is complicated because we would have
           ;; to really parse this expression.  For now, we allow only
           ;; substructures, like "aaa.bbb.ccc.ddd"
-	  (skip-chars-backward "[a-zA-Z0-9._$]")
+	  (skip-chars-backward "a-zA-Z0-9._$")
           (setq start (point)) ;; remember the start of the completion pos.
 	  (and (< (point) pos)
 	       (not (equal (char-before) ?!)) ; no sysvars
diff --git a/lisp/progmodes/idlwave.el b/lisp/progmodes/idlwave.el
index 5ff22571b9..bded09d503 100644
--- a/lisp/progmodes/idlwave.el
+++ b/lisp/progmodes/idlwave.el
@@ -7590,7 +7590,7 @@ idlwave-complete-sysvar-or-tag
 	(case-fold-search t))
     (cond ((save-excursion
 	     ;; Check if the context is right for system variable
-	     (skip-chars-backward "[a-zA-Z0-9_$]")
+	     (skip-chars-backward "a-zA-Z0-9_$")
 	     (equal (char-before) ?!))
 	   (setq idlwave-completion-help-info '(idlwave-complete-sysvar-help))
 	   (idlwave-complete-in-buffer 'sysvar 'sysvar
diff --git a/lisp/progmodes/prolog.el b/lisp/progmodes/prolog.el
index 0bc365fcf0..296a7ac3c9 100644
--- a/lisp/progmodes/prolog.el
+++ b/lisp/progmodes/prolog.el
@@ -2826,7 +2826,7 @@ prolog-pred-end
           (progn
             (if (and (eq prolog-system 'mercury)
                      (looking-at
-                      (format ":-[ \t]*\\(pred\\|mode\\)[ \t]+\\(%s+\\)"
+                      (format ":-[ \t]*\\(pred\\|mode\\)[ \t]+\\(\\(?:%s\\)+\\)"
                               prolog-atom-regexp)))
                 ;; Skip predicate declarations
                 (progn
@@ -2950,7 +2950,7 @@ prolog-clause-info
            (predname
             (if (looking-at prolog-atom-char-regexp)
                 (progn
-                  (skip-chars-forward "^ (\\.")
+                  (skip-chars-forward "^ (.")
                   (buffer-substring op (point)))
               ""))
            (arity 0))
diff --git a/lisp/progmodes/ruby-mode.el b/lisp/progmodes/ruby-mode.el
index 5998ac8e39..4fceda8937 100644
--- a/lisp/progmodes/ruby-mode.el
+++ b/lisp/progmodes/ruby-mode.el
@@ -1490,7 +1490,7 @@ ruby-forward-sexp
             (cond ((looking-at "\\?\\(\\\\[CM]-\\)*\\\\?\\S ")
                    (goto-char (match-end 0)))
                   ((progn
-                     (skip-chars-forward ",.:;|&^~=!?\\+\\-\\*")
+                     (skip-chars-forward "-,.:;|&^~=!?+*")
                      (looking-at "\\s("))
                    (goto-char (scan-sexps (point) 1)))
                   ((and (looking-at (concat "\\<\\(" ruby-block-beg-re
@@ -1533,7 +1533,7 @@ ruby-backward-sexp
     (let ((i (or arg 1)))
       (condition-case nil
           (while (> i 0)
-            (skip-chars-backward " \t\n,.:;|&^~=!?\\+\\-\\*")
+            (skip-chars-backward "- \t\n,.:;|&^~=!?+*")
             (forward-char -1)
             (cond ((looking-at "\\s)")
                    (goto-char (scan-sexps (1+ (point)) -1))
@@ -1546,7 +1546,7 @@ ruby-backward-sexp
                   ((looking-at "\\s\"\\|\\\\\\S_")
                    (let ((c (char-to-string (char-before (match-end 0)))))
                      (while (and (search-backward c)
-				 (eq (logand (skip-chars-backward "\\") 1)
+				 (eq (logand (skip-chars-backward "\\\\") 1)
 				     1))))
                    nil)
                   ((looking-at "\\s.\\|\\s\\")
diff --git a/lisp/progmodes/verilog-mode.el b/lisp/progmodes/verilog-mode.el
index f9c3177be4..9e241c70e7 100644
--- a/lisp/progmodes/verilog-mode.el
+++ b/lisp/progmodes/verilog-mode.el
@@ -10494,7 +10494,7 @@ verilog-repair-open-comma
 		      (looking-at "[(,]")))
                (not (save-excursion  ; Not `endif, or user define
 		      (backward-char 1)
-		      (skip-chars-backward "[a-zA-Z0-9_`]")
+		      (skip-chars-backward "a-zA-Z0-9_`")
 		      (looking-at "`"))))
       (insert ","))))
 
diff --git a/lisp/term.el b/lisp/term.el
index 693362cc73..586a887a29 100644
--- a/lisp/term.el
+++ b/lisp/term.el
@@ -1984,8 +1984,8 @@ term-magic-space
 (defun term-within-quotes (beg end)
   "Return t if the number of quotes between BEG and END is odd.
 Quotes are single and double."
-  (let ((countsq (term-how-many-region "\\(^\\|[^\\\\]\\)'" beg end))
-	(countdq (term-how-many-region "\\(^\\|[^\\\\]\\)\"" beg end)))
+  (let ((countsq (term-how-many-region "\\(^\\|[^\\]\\)'" beg end))
+	(countdq (term-how-many-region "\\(^\\|[^\\]\\)\"" beg end)))
     (or (= (mod countsq 2) 1) (= (mod countdq 2) 1))))
 
 (defun term-how-many-region (regexp beg end)
diff --git a/lisp/textmodes/bib-mode.el b/lisp/textmodes/bib-mode.el
index 81dfb6c99c..7a5d3ef775 100644
--- a/lisp/textmodes/bib-mode.el
+++ b/lisp/textmodes/bib-mode.el
@@ -198,7 +198,7 @@ unread-bib
 
 (defvar bib-capitalize-title-stop-words
    (concat
-      "the\\|and\\|of\\|is\\|a\\|an\\|of\\|for\\|in\\|to\\|in\\|on\\|at\\|"
+      "the\\|and\\|of\\|is\\|a\\|an\\|for\\|in\\|to\\|on\\|at\\|"
       "by\\|with\\|that\\|its")
    "Words not to be capitalized in a title (unless the first word).")
 
diff --git a/lisp/textmodes/refbib.el b/lisp/textmodes/refbib.el
index f8013c73bb..3ba52e61ea 100644
--- a/lisp/textmodes/refbib.el
+++ b/lisp/textmodes/refbib.el
@@ -137,7 +137,7 @@ r2b-delimit-with-quote
 
 (defvar r2b-capitalize-title-stop-words
    (concat
-      "the\\|and\\|of\\|is\\|a\\|an\\|of\\|for\\|in\\|to\\|in\\|on\\|at\\|"
+      "the\\|and\\|of\\|is\\|a\\|an\\|for\\|in\\|to\\|on\\|at\\|"
       "by\\|with\\|that\\|its")
    "Words not to be capitalized in a title (unless the first word).")
 
diff --git a/lisp/textmodes/reftex-parse.el b/lisp/textmodes/reftex-parse.el
index 2f9b7268fc..005816e965 100644
--- a/lisp/textmodes/reftex-parse.el
+++ b/lisp/textmodes/reftex-parse.el
@@ -1000,7 +1000,7 @@ reftex-nth-arg
                     (eq (following-char) ?\{))
           (cl-incf cnt)))
       (if (and (= n cnt)
-               (> (skip-chars-forward "{\\[") 0))
+               (> (skip-chars-forward "{[") 0))
           (reftex-context-substring)
         nil))))
 
diff --git a/lisp/textmodes/rst.el b/lisp/textmodes/rst.el
index 0fc2f170a6..ba5d7e4f46 100644
--- a/lisp/textmodes/rst.el
+++ b/lisp/textmodes/rst.el
@@ -225,7 +225,7 @@ rst-svn-rev
   "The SVN revision of this file.
 SVN revision is the upstream (docutils) revision.")
 (defconst rst-svn-timestamp
-  (rst-extract-version "\\$" "LastChangedDate: " ".+?+" " "
+  (rst-extract-version "\\$" "LastChangedDate: " ".+" " "
 		       "$LastChangedDate: 2017-01-08 10:54:35 +0100 (Sun, 08 Jan 2017) $")
   "The SVN time stamp of this file.")
 
diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 651a2cc94c..46baa8a148 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -513,7 +513,7 @@ url-http-parse-response
   (url-http-debug "url-http-parse-response called in (%s)" (buffer-name))
   (goto-char (point-min))
   (skip-chars-forward " \t\n")		; Skip any blank crap
-  (skip-chars-forward "HTTP/")		; Skip HTTP Version
+  (skip-chars-forward "/HPT")		; Skip HTTP Version "HTTP/".
   (setq url-http-response-version
 	(buffer-substring (point)
 			  (progn
diff --git a/test/lisp/progmodes/f90-tests.el b/test/lisp/progmodes/f90-tests.el
index 3cd7392bbc..b164238841 100644
--- a/test/lisp/progmodes/f90-tests.el
+++ b/test/lisp/progmodes/f90-tests.el
@@ -98,7 +98,7 @@ f90-test-indent
     (insert "(/ x /)")
     (f90-do-auto-fill)
     (beginning-of-line)
-    (skip-chars-forward "[ \t]")
+    (skip-chars-forward " \t")
     (should (equal "&(/" (buffer-substring (point) (+ 3 (point)))))))
 
 ;; TODO bug#5593
-- 
2.20.1


[-- Attachment #3: mail-extr.txt --]
[-- Type: text/plain, Size: 1569 bytes --]

lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ת' inside character alternative (pos 23)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   .......................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ק' inside character alternative (pos 24)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ........................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ר' inside character alternative (pos 26)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ..........................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ת' inside character alternative (pos 27)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ...........................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ש' inside character alternative (pos 28)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ............................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ת' inside character alternative (pos 30)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ..............................^
lisp/mail/footnote.el:366:3: In footnote-hebrew-numeric-regex: Duplicated `ת' inside character alternative (pos 32)
  "[אבגדהוזחטיכלמנסעפצקרשתתקתרתשתתתתק']+"
   ................................^

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-19  1:21 ` Paul Eggert
@ 2019-03-19 10:34   ` Mattias Engdegård
  2019-03-20  1:53     ` Paul Eggert
  0 siblings, 1 reply; 31+ messages in thread
From: Mattias Engdegård @ 2019-03-19 10:34 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

19 mars 2019 kl. 02.21 skrev Paul Eggert <eggert@cs.ucla.edu>:
> 
> Thanks for doing that. I fixed all the regular expressions I could
> easily fix, by installing the attached patch into master.

Good work, comments below.

> I saw two false alarms, both in lisp/org/org-datetree.el. Both are of
> the form (re-search-forward (format "[chars%%]+" ...) nil t), in which
> both '%' characters are needed. Perhaps you could tweak the trawler to
> not report these?

If you could stomach some sophistry, I believe the Computer is Right here. The argument to org-datetree--find-create is named REGEX, but these strings are not regexps, merely templates for such. Thus, the argument is misnamed; by changing it to REGEX-TEMPLATE, the complaint goes away.

(By the way, I was appalled to discover that `format' doesn't complain about supernumerary arguments. Its (ab)use in this function indicates that it would be hard to change that behaviour.)

> I assume this was a complete trawl, so that I can ignore the earlier
> scans you emailed (I got behind in looking into them). If not, please
> let me know.

Yes, it was complete.

--- a/lisp/auth-source-pass.el
+++ b/lisp/auth-source-pass.el
...
 (defun auth-source-pass--parse-data (contents)
   "Parse the password-store data in the string CONTENTS and return an alist.
 CONTENTS is the contents of a password-store formatted file."
-  (let ((lines (split-string contents "\\\n" t "\\\s")))
+  (let ((lines (split-string contents "\n" t "\\\s")))
                                               ^^^^
The last argument is also a regexp. Presumably it should be just " ",
or did the author mean any horizontal whitespace?

--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -513,7 +513,7 @@ Return the number of characters removed."
   (url-http-debug "url-http-parse-response called in (%s)" (buffer-name))
   (goto-char (point-min))
   (skip-chars-forward " \t\n")		; Skip any blank crap
-  (skip-chars-forward "HTTP/")		; Skip HTTP Version
+  (skip-chars-forward "/HPT")		; Skip HTTP Version "HTTP/".

It looks rather like the intention was to skip the very string "HTTP/",
but I suppose that will do. A few lines later:

   (setq url-http-response-version
 	(buffer-substring (point)
 			  (progn
			    (skip-chars-forward "[0-9].")
                                                 ^^^^^^
This one should surely not include the brackets.
The HTTP response string is something like "HTTP/1.1 451 CENSORED".





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-19 10:34   ` Mattias Engdegård
@ 2019-03-20  1:53     ` Paul Eggert
  2019-03-20  2:20       ` Stefan Monnier
  2019-03-20 10:04       ` Mattias Engdegård
  0 siblings, 2 replies; 31+ messages in thread
From: Paul Eggert @ 2019-03-20  1:53 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 521 bytes --]

Thanks for proofreading that. I installed the attached to try to fix the 
problems you noted.

I also fixed a couple of other problems if I happened to run across them. First, 
the regexp [a-b-c] is ambiguous according to POSIX, and should be avoided. 
Second, a regexp like [[:alnum:]-z] is also ambiguous for the same reason. 
Perhaps these regexps currently have a particular behavior in Emacs but it's not 
documented as far as I know and code should avoid them. Perhaps the trawler 
could be modified to catch them.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-More-minor-regex-cleanup.patch --]
[-- Type: text/x-patch; name="0001-More-minor-regex-cleanup.patch", Size: 7639 bytes --]

From e14c0d748efe35afc653151ff18c4dd93dcc456e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 19 Mar 2019 18:45:17 -0700
Subject: [PATCH] More minor regex cleanup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problems reported by Mattias Engdegård in:
https://lists.gnu.org/r/emacs-devel/2019-03/msg00643.html
plus a few others that I noticed.
* lisp/auth-source-pass.el (auth-source-pass--parse-data):
* lisp/org/org-datetree.el (org-datetree--find-create):
* lisp/org/org-pcomplete.el (org-thing-at-point):
* lisp/progmodes/js.el (js--end-of-do-while-loop-p):
* lisp/textmodes/sgml-mode.el:
(sgml-electric-tag-pair-before-change-function):
* lisp/textmodes/texnfo-upd.el (texinfo-menu-copy-old-description):
* lisp/url/url-http.el (url-http-parse-response):
Fix regular expression and similar syntax.
---
 lisp/auth-source-pass.el     | 2 +-
 lisp/org/org-datetree.el     | 9 +++++----
 lisp/org/org-pcomplete.el    | 4 ++--
 lisp/progmodes/cperl-mode.el | 2 +-
 lisp/progmodes/js.el         | 2 +-
 lisp/textmodes/sgml-mode.el  | 7 ++++---
 lisp/textmodes/texnfo-upd.el | 2 +-
 lisp/url/url-http.el         | 2 +-
 8 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/lisp/auth-source-pass.el b/lisp/auth-source-pass.el
index 29ff9c6685..4283ed0392 100644
--- a/lisp/auth-source-pass.el
+++ b/lisp/auth-source-pass.el
@@ -139,7 +139,7 @@ auth-source-pass--parse-secret
 (defun auth-source-pass--parse-data (contents)
   "Parse the password-store data in the string CONTENTS and return an alist.
 CONTENTS is the contents of a password-store formatted file."
-  (let ((lines (split-string contents "\n" t "\\\s")))
+  (let ((lines (split-string contents "\n" t "[ \t]+")))
     (seq-remove #'null
                 (mapcar (lambda (line)
                           (let ((pair (mapcar (lambda (s) (string-trim s))
diff --git a/lisp/org/org-datetree.el b/lisp/org/org-datetree.el
index aea2c8d3d6..b4797de1e5 100644
--- a/lisp/org/org-datetree.el
+++ b/lisp/org/org-datetree.el
@@ -138,15 +138,16 @@ org-datetree-find-iso-week-create
        "^\\*+[ \t]+%d-%02d-\\([0123][0-9]\\) \\w+$"
        year month day))))
 
-(defun org-datetree--find-create (regex year &optional month day insert)
-  "Find the datetree matched by REGEX for YEAR, MONTH, or DAY.
-REGEX is passed to `format' with YEAR, MONTH, and DAY as
+(defun org-datetree--find-create
+    (regex-template year &optional month day insert)
+  "Find the datetree matched by REGEX-TEMPLATE for YEAR, MONTH, or DAY.
+REGEX-TEMPLATE is passed to `format' with YEAR, MONTH, and DAY as
 arguments.  Match group 1 is compared against the specified date
 component.  If INSERT is non-nil and there is no match then it is
 inserted into the buffer."
   (when (or month day)
     (org-narrow-to-subtree))
-  (let ((re (format regex year month day))
+  (let ((re (format regex-template year month day))
 	match)
     (goto-char (point-min))
     (while (and (setq match (re-search-forward re nil t))
diff --git a/lisp/org/org-pcomplete.el b/lisp/org/org-pcomplete.el
index 49983c40a5..cf272de90a 100644
--- a/lisp/org/org-pcomplete.el
+++ b/lisp/org/org-pcomplete.el
@@ -49,10 +49,10 @@ org-thing-at-point
   "Examine the thing at point and let the caller know what it is.
 The return value is a string naming the thing at point."
   (let ((beg1 (save-excursion
-		(skip-chars-backward "[:alnum:]-_@")
+		(skip-chars-backward "-[:alnum:]_@")
 		(point)))
 	(beg (save-excursion
-	       (skip-chars-backward "a-zA-Z0-9-_:$")
+	       (skip-chars-backward "-a-zA-Z0-9_:$")
 	       (point)))
 	(line-to-here (buffer-substring (point-at-bol) (point))))
     (cond
diff --git a/lisp/progmodes/cperl-mode.el b/lisp/progmodes/cperl-mode.el
index 0b6008a511..73b55e29a5 100644
--- a/lisp/progmodes/cperl-mode.el
+++ b/lisp/progmodes/cperl-mode.el
@@ -1884,7 +1884,7 @@ cperl-comment-indent
 ;;Point is at start of real comment."
 ;;  (let ((c (current-column)) target cnt prevc)
 ;;    (if (= c comment-column) nil
-;;      (setq cnt (skip-chars-backward "[ \t]"))
+;;      (setq cnt (skip-chars-backward " \t"))
 ;;      (setq target (max (1+ (setq prevc
 ;;			     (current-column))) ; Else indent at comment column
 ;;		   comment-column))
diff --git a/lisp/progmodes/js.el b/lisp/progmodes/js.el
index f1ec5ceea5..4d91da7334 100644
--- a/lisp/progmodes/js.el
+++ b/lisp/progmodes/js.el
@@ -1908,7 +1908,7 @@ js--end-of-do-while-loop-p
     (save-match-data
       (when (looking-at "\\s-*\\_<while\\_>")
 	(if (save-excursion
-	      (skip-chars-backward "[ \t\n]*}")
+	      (skip-chars-backward " \t\n}")
 	      (looking-at "[ \t\n]*}"))
 	    (save-excursion
 	      (backward-list) (forward-symbol -1) (looking-at "\\_<do\\_>"))
diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index e49144e290..9e3be99af1 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -894,7 +894,7 @@ sgml-electric-tag-pair-before-change-function
   (condition-case err
   (save-excursion
     (goto-char end)
-    (skip-chars-backward "[:alnum:]-_.:")
+    (skip-chars-backward "-[:alnum:]_.:")
     (if (and ;; (<= (point) beg) ; This poses problems for downcase-word.
              (or (eq (char-before) ?<)
                  (and (eq (char-before) ?/)
@@ -902,7 +902,7 @@ sgml-electric-tag-pair-before-change-function
              (null (get-char-property (point) 'text-clones)))
         (let* ((endp (eq (char-before) ?/))
                (cl-start (point))
-               (cl-end (progn (skip-chars-forward "[:alnum:]-_.:") (point)))
+	       (cl-end (progn (skip-chars-forward "-[:alnum:]_.:") (point)))
                (match
                 (if endp
                     (when (sgml-skip-tag-backward 1) (forward-char 1) t)
@@ -919,7 +919,8 @@ sgml-electric-tag-pair-before-change-function
                      (equal (buffer-substring cl-start cl-end)
                             (buffer-substring (point)
                                               (save-excursion
-                                                (skip-chars-forward "[:alnum:]-_.:")
+						(skip-chars-forward
+						 "-[:alnum:]_.:")
                                                 (point))))
                      (or (not endp) (eq (char-after cl-end) ?>)))
             (when clones
diff --git a/lisp/textmodes/texnfo-upd.el b/lisp/textmodes/texnfo-upd.el
index 8c6e23eae4..e960e992a8 100644
--- a/lisp/textmodes/texnfo-upd.el
+++ b/lisp/textmodes/texnfo-upd.el
@@ -642,7 +642,7 @@ texinfo-menu-copy-old-description
   "Return description field of old menu line as string.
 Point must be located just after the node name.  Point left before description.
 Single argument, END-OF-MENU, is position limiting search."
-  (skip-chars-forward "[:.,\t\n ]+")
+  (skip-chars-forward ":.,\t\n ")
   ;; don't copy a carriage return at line beginning with asterisk!
   ;; don't copy @detailmenu or @end menu or @ignore as descriptions!
   ;; do copy a description that begins with an `@'!
diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 46baa8a148..1fbc087073 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -517,7 +517,7 @@ url-http-parse-response
   (setq url-http-response-version
 	(buffer-substring (point)
 			  (progn
-			    (skip-chars-forward "[0-9].")
+			    (skip-chars-forward "0-9.")
 			    (point))))
   (setq url-http-response-status (read (current-buffer))))
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20  1:53     ` Paul Eggert
@ 2019-03-20  2:20       ` Stefan Monnier
  2019-03-20 22:01         ` Paul Eggert
                           ` (2 more replies)
  2019-03-20 10:04       ` Mattias Engdegård
  1 sibling, 3 replies; 31+ messages in thread
From: Stefan Monnier @ 2019-03-20  2:20 UTC (permalink / raw)
  To: emacs-devel

> I also fixed a couple of other problems if I happened to run across
> them. First, the regexp [a-b-c] is ambiguous according to POSIX,

Maybe according to POSIX, but not according to Emacs Lisp's reference
manual:

    To include a ‘]’ in a character alternative, you must make it the
    first character.  For example, ‘[]a]’ matches ‘]’ or ‘a’.  To
    include a ‘-’, write ‘-’ as the first or last character of the
    character alternative, or put it after a range.  Thus, ‘[]-]’
    matches both ‘]’ and ‘-’.  (As explained below, you cannot use ‘\]’
    to include a ‘]’ inside a character alternative, since ‘\’ is not
    special there.)

> and should be avoided.

Maybe I could go along with that.
[ I wish Elisp regexps just used \ as an escape char in ranges instead
  of relying on those special convention of where `-` and `]` can appear
  in order to count as themselves.  ]

> Second, a regexp like [[:alnum:]-z] is also ambiguous for the
> same reason.

I think the doc above was written for we had those char classes, but if
[a-b-c] is not ambiguous then I think it's only natural to declare that
[[:alnum:]-z] is not ambiguous either.

> Perhaps these regexps currently have a particular behavior in
> Emacs but it's not documented as far as I know and code should avoid
> them.  Perhaps the trawler could be modified to catch them.

I wonder why the doc doesn't just say that `-` should be the last
character and not mention the other possibilities which just make the
rule unnecessarily complex.


        Stefan




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20  1:53     ` Paul Eggert
  2019-03-20  2:20       ` Stefan Monnier
@ 2019-03-20 10:04       ` Mattias Engdegård
  1 sibling, 0 replies; 31+ messages in thread
From: Mattias Engdegård @ 2019-03-20 10:04 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

20 mars 2019 kl. 02.53 skrev Paul Eggert <eggert@cs.ucla.edu>:
> 
> Thanks for proofreading that. I installed the attached to try to fix the problems you noted.

That looks fine. You found more skip-chars-* rubbish to clean up, good! Turn any stone...

> I also fixed a couple of other problems if I happened to run across them. First, the regexp [a-b-c] is ambiguous according to POSIX, and should be avoided. Second, a regexp like [[:alnum:]-z] is also ambiguous for the same reason. Perhaps these regexps currently have a particular behavior in Emacs but it's not documented as far as I know and code should avoid them. Perhaps the trawler could be modified to catch them.

Right, the rules are muddy; they had to be reverse-engineered for xr. I was a bit disappointed not to find any [a-[:alnum:]] anywhere though; that would have been a sight.
I'll see what can be done -- it's always a matter of balance to avoid false positives.

By the way, my little diatribe about format arguments led to a mechanised search for argument count violations. One found, in eshell, pushed to master as obvious:

diff --git a/lisp/eshell/esh-proc.el b/lisp/eshell/esh-proc.el
index 55251f5bfb..3432582cf4 100644
--- a/lisp/eshell/esh-proc.el
+++ b/lisp/eshell/esh-proc.el
@@ -498,7 +498,7 @@ See the variable `eshell-kill-processes-on-exit'."
                                        (buffer-name))))
          (eshell-round-robin-kill
           (if (eq eshell-kill-processes-on-exit 'every)
-              (format-message "Kill Eshell child process `%s'? "))))
+              "Kill Eshell child process `%s'? ")))
       (let ((buf (get-buffer "*Process List*")))
        (if (and buf (buffer-live-p buf))
            (kill-buffer buf)))




^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20  2:20       ` Stefan Monnier
@ 2019-03-20 22:01         ` Paul Eggert
  2019-03-20 22:59           ` Drew Adams
                             ` (2 more replies)
  2019-03-21  2:07         ` Richard Stallman
  2019-03-22 13:26         ` Stephen Leake
  2 siblings, 3 replies; 31+ messages in thread
From: Paul Eggert @ 2019-03-20 22:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Mattias Engdegård, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

On 3/19/19 7:20 PM, Stefan Monnier wrote:
> I wonder why the doc doesn't just say that `-` should be the last
> character and not mention the other possibilities which just make the
> rule unnecessarily complex.


'-' can also be the first character in a regular expression; this is
pretty common and is standard. POSIX also says '-' can be the upper
bound of a range, which is a bit weird (but hey! it's standard).

I went through the documentation and attempted to fix the doc to
describe this mess better by installing the attached patch into the
emacs-26 branch. The basic ideas are:

* The doc already says that regular expressions like "*foo" and "+foo"
are problematic (they're confusing, and POSIX says the behavior is
undefined) and should be avoided. REs like "[a-m-z]" and "[!-[:alpha:]]"
and "[[:alpha:]-~]" are problematic in the same way and also should be
avoided.

* The doc doesn't clearly say when the Emacs range behavior is an
extension to POSIX; saying this will help people know better when they
can export Emacs regular expressions to other programs.

* The doc is confused (and there's a comment about this) about what
happens when one end of a range is unibyte and the other is multibyte. I
added something saying that if one bound is a raw 8-bit byte then the
other should be a unibyte character (either ASCII, or a raw 8-bit byte).
I don't see any good way to specify the behavior when one bound is a raw
8-bit byte and the other bound is a multibyte character, in such a way
that it's a natural extension of the documented behavior, so the
documentation now recommends against that.

* We might as well go ahead and say that [b-a] matches nothing, as
enough code (ab)uses regexps in that way, and there is value in having a
simple regular expression that always fails to match. However, I expect
that we should say that users should avoid wilder examples like [~-!] so
that the trawler can catch them as typos.

These new recommendations ("should"s in the attached patch) will give
the trawler license to diagnose questionable REs like "[a-m-z]",
"[!-[:alpha:]]", "[~-!]", and (my favorite) "[\u00FF-\xFF]". There is no
change to actual Emacs behavior.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Say-which-regexp-ranges-should-be-avoided.patch --]
[-- Type: text/x-patch; name="0001-Say-which-regexp-ranges-should-be-avoided.patch", Size: 5218 bytes --]

From 981bd72cb5fee582067a691cc0de94c6b6fd1f1d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 20 Mar 2019 14:43:30 -0700
Subject: [PATCH] Say which regexp ranges should be avoided
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* doc/lispref/searching.texi (Regexp Special): Say that
regular expressions like "[a-m-z]" and "[[:alpha:]-~]" should
be avoided, for the same reason that regular expressions like
"+" and "*" should be avoided: POSIX says their behavior is
undefined, and they are confusing anyway.  Also, explain
better what happens when the bound of a range is a raw 8-bit
byte; the old explanation appears to have been obsolete
anyway.  Finally, say that ranges like "[\u00FF-\xFF]" that
mix non-ASCII characters and raw 8-bit bytes should be
avoided, since it’s not clear what they should mean.
---
 doc/lispref/searching.texi | 54 ++++++++++++++++++++++++--------------
 1 file changed, 35 insertions(+), 19 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 7546863dde..0cf527b6ac 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -391,25 +391,18 @@ Regexp Special
 Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
 Ranges may be intermixed freely with individual characters, as in
 @samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
-or @samp{$}, @samp{%} or period.
+or @samp{$}, @samp{%} or period.  However, the ending character of one
+range should not be the starting point of another one; for example,
+@samp{[a-m-z]} should be avoided.
 
-If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also
-matches upper-case letters.  Note that a range like @samp{[a-z]} is
-not affected by the locale's collation sequence, it always represents
-a sequence in @acronym{ASCII} order.
-@c This wasn't obvious to me, since, e.g., the grep manual "Character
-@c Classes and Bracket Expressions" specifically notes the opposite
-@c behavior.  But by experiment Emacs seems unaffected by LC_COLLATE
-@c in this regard.
-
-Note also that the usual regexp special characters are not special inside a
+The usual regexp special characters are not special inside a
 character alternative.  A completely different set of characters is
 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
 
 To include a @samp{]} in a character alternative, you must make it the
 first character.  For example, @samp{[]a]} matches @samp{]} or @samp{a}.
 To include a @samp{-}, write @samp{-} as the first or last character of
-the character alternative, or put it after a range.  Thus, @samp{[]-]}
+the character alternative, or as the upper bound of a range.  Thus, @samp{[]-]}
 matches both @samp{]} and @samp{-}.  (As explained below, you cannot
 use @samp{\]} to include a @samp{]} inside a character alternative,
 since @samp{\} is not special there.)
@@ -417,13 +410,34 @@ Regexp Special
 To include @samp{^} in a character alternative, put it anywhere but at
 the beginning.
 
-@c What if it starts with a multibyte and ends with a unibyte?
-@c That doesn't seem to match anything...?
-If a range starts with a unibyte character @var{c} and ends with a
-multibyte character @var{c2}, the range is divided into two parts: one
-spans the unibyte characters @samp{@var{c}..?\377}, the other the
-multibyte characters @samp{@var{c1}..@var{c2}}, where @var{c1} is the
-first character of the charset to which @var{c2} belongs.
+The following aspects of ranges are specific to Emacs, in that POSIX
+allows but does not require this behavior and programs other than
+Emacs may behave differently:
+
+@enumerate
+@item
+If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also
+matches upper-case letters.
+
+@item
+A range is not affected by the locale's collation sequence: it always
+represents the set of characters with codepoints ranging between those
+of its bounds, so that @samp{[a-z]} matches only ASCII letters, even
+outside the C or POSIX locale.
+
+@item
+As a special case, if either bound of a range is a raw 8-bit byte, the
+other bound should be a unibyte character, and the range matches only
+unibyte characters.
+
+@item
+If the lower bound of a range is greater than its upper bound, the
+range is empty and represents no characters.  Thus, @samp{[b-a]}
+always fails to match, and @samp{[^b-a]} matches any character,
+including newline.  However, the lower bound should be at most one
+greater than the upper bound; for example, @samp{[c-a]} should be
+avoided.
+@end enumerate
 
 A character alternative can also specify named character classes
 (@pxref{Char Classes}).  This is a POSIX feature.  For example,
@@ -431,6 +445,8 @@ Regexp Special
 Using a character class is equivalent to mentioning each of the
 characters in that class; but the latter is not feasible in practice,
 since some classes include thousands of different characters.
+A character class should not appear as the lower or upper bound
+of a range.
 
 @item @samp{[^ @dots{} ]}
 @cindex @samp{^} in regexp
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: Scan of regexps in Emacs (March 17)
  2019-03-20 22:01         ` Paul Eggert
@ 2019-03-20 22:59           ` Drew Adams
  2019-03-20 23:10             ` Paul Eggert
       [not found]             ` <<deeccd91-0f43-c329-6087-17435550b328@cs.ucla.edu>
  2019-03-21  0:57           ` Stefan Monnier
  2019-03-21 11:15           ` Mattias Engdegård
  2 siblings, 2 replies; 31+ messages in thread
From: Drew Adams @ 2019-03-20 22:59 UTC (permalink / raw)
  To: Paul Eggert, Stefan Monnier; +Cc: Mattias Engdegård, emacs-devel

(Caveat: I have not been following this thread, and I
have not looked at your change.)

> These new recommendations ("should"s in the attached patch)...
            ^^^^^^^^^^^^^^^   ^^^^^^

*Should* is a weasel word that should (!) generally
be avoided in technical doc (including user guides).

Either say "must" (e.g. if we raise an error otherwise)
or say that we "recommend"...  (And extra points for
saying _why_ it's recommended, something that users
generally deserve to understand.)


https://en.wikipedia.org/wiki/Weasel_word



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20 22:59           ` Drew Adams
@ 2019-03-20 23:10             ` Paul Eggert
  2019-03-21  3:38               ` Eli Zaretskii
       [not found]             ` <<deeccd91-0f43-c329-6087-17435550b328@cs.ucla.edu>
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Eggert @ 2019-03-20 23:10 UTC (permalink / raw)
  To: Drew Adams; +Cc: emacs-devel

On 3/20/19 3:59 PM, Drew Adams wrote:
> *Should* is a weasel word that should (!) generally
> be avoided

2 points for the meta-"should" in your comment.

As the Emacs documentation currently has thousands of occurrences of
"should", it sounds like we don't generally follow the guideline you're
suggesting.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20 22:01         ` Paul Eggert
  2019-03-20 22:59           ` Drew Adams
@ 2019-03-21  0:57           ` Stefan Monnier
  2019-03-21 11:15           ` Mattias Engdegård
  2 siblings, 0 replies; 31+ messages in thread
From: Stefan Monnier @ 2019-03-21  0:57 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Mattias Engdegård, emacs-devel

>> I wonder why the doc doesn't just say that `-` should be the last
>> character and not mention the other possibilities which just make the
>> rule unnecessarily complex.
> '-' can also be the first character in a regular expression; this is
> pretty common and is standard.

What I meant is that we should simply tell people to always put it at
the end.  AFAIK you can always put it at the end, so it's a simpler
rule to follow.  That doesn't mean we have to reject a `-` in other
positions (so we can also accept the usual other rule, but we don't
need to advertise it as much as the main rule).


        Stefan



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20  2:20       ` Stefan Monnier
  2019-03-20 22:01         ` Paul Eggert
@ 2019-03-21  2:07         ` Richard Stallman
  2019-03-22 13:26         ` Stephen Leake
  2 siblings, 0 replies; 31+ messages in thread
From: Richard Stallman @ 2019-03-21  2:07 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > [ I wish Elisp regexps just used \ as an escape char in ranges instead
  >   of relying on those special convention of where `-` and `]` can appear
  >   in order to count as themselves.  ]

It might have been better, but we can't change it.  Even when first
implementing this in Emacs, I was following historical practice which
was later codified into standards.

  > I wonder why the doc doesn't just say that `-` should be the last
  > character and not mention the other possibilities which just make the
  > rule unnecessarily complex.

It could be reasonable to document that putting it at the beginning or
the end is the right way.


-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20 23:10             ` Paul Eggert
@ 2019-03-21  3:38               ` Eli Zaretskii
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2019-03-21  3:38 UTC (permalink / raw)
  To: Paul Eggert; +Cc: drew.adams, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 20 Mar 2019 16:10:39 -0700
> Cc: emacs-devel@gnu.org
> 
> As the Emacs documentation currently has thousands of occurrences of
> "should", it sounds like we don't generally follow the guideline you're
> suggesting.

Indeed.  And the reason is likely that our documentation is not
"technical documentation" in the sense meant in that article.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Scan of regexps in Emacs (March 17)
       [not found]               ` <<83d0mk6go5.fsf@gnu.org>
@ 2019-03-21  4:21                 ` Drew Adams
  2019-03-21 14:17                   ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Drew Adams @ 2019-03-21  4:21 UTC (permalink / raw)
  To: Eli Zaretskii, Paul Eggert; +Cc: drew.adams, emacs-devel

> > As the Emacs documentation currently has thousands of occurrences of
> > "should", it sounds like we don't generally follow the guideline you're
> > suggesting.

If they're accompanied by stating _why_ you should
then there's generally no problem understanding.

Call that a scoped/qualified "should" - not really
a problem, because the follow-up explains (by
describing consequences) what we mean there by
"should".
 
> Indeed.  And the reason is likely that our documentation is not
> "technical documentation" in the sense meant in that article.

There was no article.  Just my opinion.  (The URL
was just to explain the term "weasel word".)

And in my opinion our doc _is_ technical doc.

If we don't say why you should then it's usually
preferable (IMO) to say that we "recommend" that
you do XYZ than to say you "should" do XYZ.

In any case, it's usually helpful to say why.

The problem with an empty "should" is that it's
vague, and it can be misunderstood as a (possibly
shame-faced) "must".  (Shame-faced ~ weasel.)

Something that's presented as an unexplained
recommendation is at least seldom confused with a
requirement (a "must").  "Should" is more vague
and is often understood as stronger than just
"it's a good idea" (a recommendation).

But "should", "recommend", and "must" all typically
beg the question of what happens if you don't.
With "should", in particular, it's also typically
unclear how important it is that you do what's
suggested.

"Must" is pretty clearly important - a requirement
of some sort.  But for "must" too it helps to say
what happens if you don't respect the rule.

To be clear, I'm not saying it's appropriate or
enough to just replace all occurrences of "should",
or that all such occurrences are unwise.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20 22:01         ` Paul Eggert
  2019-03-20 22:59           ` Drew Adams
  2019-03-21  0:57           ` Stefan Monnier
@ 2019-03-21 11:15           ` Mattias Engdegård
  2019-04-02  7:33             ` Paul Eggert
  2 siblings, 1 reply; 31+ messages in thread
From: Mattias Engdegård @ 2019-03-21 11:15 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4250 bytes --]

20 mars 2019 kl. 23.01 skrev Paul Eggert <eggert@cs.ucla.edu>:
> 
> On 3/19/19 7:20 PM, Stefan Monnier wrote:
>> I wonder why the doc doesn't just say that `-` should be the last
>> character and not mention the other possibilities which just make the
>> rule unnecessarily complex.

Agreed, that is what the 'how to write regexps' part of the docs should say. But don't we also need a precise description of exactly how they are interpreted by the engine? Otherwise, a user cannot read and understand existing code. (Unless he or she uses xr!) Perhaps there needs to be a separate 'gritty details' section.

> * The doc already says that regular expressions like "*foo" and "+foo"
> are problematic (they're confusing, and POSIX says the behavior is
> undefined) and should be avoided. REs like "[a-m-z]" and "[!-[:alpha:]]"
> and "[[:alpha:]-~]" are problematic in the same way and also should be
> avoided.

I'm with Stefan here; `-' should go last. Anything else is a gritty detail.

> * The doc doesn't clearly say when the Emacs range behavior is an
> extension to POSIX; saying this will help people know better when they
> can export Emacs regular expressions to other programs.

Documenting differences from POSIX regexps is useful. Do you prefer having those differences being spread out, or all concentrated into one section?

These days, a user may be more familiar with the various PCRE dialects than traditional or extended POSIX. Should that be taken into account?

> * The doc is confused (and there's a comment about this) about what
> happens when one end of a range is unibyte and the other is multibyte. I
> added something saying that if one bound is a raw 8-bit byte then the
> other should be a unibyte character (either ASCII, or a raw 8-bit byte).
> I don't see any good way to specify the behavior when one bound is a raw
> 8-bit byte and the other bound is a multibyte character, in such a way
> that it's a natural extension of the documented behavior, so the
> documentation now recommends against that.

The terminology is a bit confusing. Is 'raw 8-bit byte' included in 'unibyte'? Is \x7f ever a raw 8-bit byte?
I agree that [å-\xff], say, should be invalid but I've never seen such constructs.

> * We might as well go ahead and say that [b-a] matches nothing, as
> enough code (ab)uses regexps in that way, and there is value in having a
> simple regular expression that always fails to match. However, I expect
> that we should say that users should avoid wilder examples like [~-!] so
> that the trawler can catch them as typos.

It already does, and some bugs were found that way. As a special case, it no longer complains about z-a because that is unlikely to be an accident and occurs in some code on purpose.

I'm not sure it's a good idea to document reversed ranges as a recommended way to match any or no character (although the description of the semantics would belong in a 'gritty details' section), and only to use [Y-X] where Y=X+1. More about that in a separate post.

> These new recommendations ("should"s in the attached patch) will give
> the trawler license to diagnose questionable REs like "[a-m-z]",
> "[!-[:alpha:]]", "[~-!]", and (my favorite) "[\u00FF-\xFF]". There is no
> change to actual Emacs behavior.

As an experiment, I added detection of 'chained' ranges like [a-m-z] to xr and found a handful in both Emacs and GNU ELPA, but none of them carried a freeload of bugs. Keeping that check didn't seem worthwhile; the regexps may be a bit odd-looking, but aren't wrong.

[!-[:alpha:]] is already detected since xr parses it correctly and will complain about the duplication of ':'. The reverse, [[:digit:]-z], is seen occasionally but again does not seem to be a serious bug proxy.

Much as I would like to outlaw ranges where a typical programmer has to consult an ASCII table to understand what's included, they just seem too common, with too many false positives, to merit inclusion in xr.
Nevertheless I had a quick look and extracted a few that might merit attention; see attachment.

Similarly, a rule finding [X-Y] where Y=X+1 found one or two questionable cases in a sea of false positives (also in the attachment).

[-- Attachment #2: possibly-broken-regexps.log --]
[-- Type: application/octet-stream, Size: 992 bytes --]

/Users/mattias/emacs/lisp/vc/diff-mode.el:2215:19: In call to re-search-forward: Unintuitive range `+-<' (pos 3)
  "\n[!+-<>]\\(-- [0-9]+\\(,[0-9]+\\)? ----\n\\( .*\n\\)*[+]\\)?"
   ....^
/Users/mattias/emacs/lisp/speedbar.el:2852:42: In call to re-search-forward: Unintuitive range `+-?' (pos 21)
  "^\\([0-9]+\\):\\s-*[[<][+-?][]>] "
   ........................^
/Users/mattias/emacs/lisp/speedbar.el:2903:42: In call to re-search-forward: Unintuitive range `+-?' (pos 19)
  "^\\([0-9]+\\):\\s-*\\[[+-?]\\] "
   .......................^
/Users/mattias/emacs/lisp/woman.el:3514:26: In call to looking-at: Unintuitive range `+-/' (pos 1)
  "[+-/*%]"
   .^
/Users/mattias/emacs/lisp/net/webjump.el:345:39: In call to string-match: Two-character range `.-/' (pos 8)
  "[a-zA-Z_.-/]"
   ........^
/Users/mattias/emacs/lisp/align.el:386:3: In align-rules-list (perl-assignment): Two-character range `*-+' (pos 6)
  "[^=!^&*-+<>/| \t\n]\\(\\s-*\\)=[~>]?\\(\\s-*\\)\\([^>= \t\n]\\|$\\)"
   ......^

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-21  4:21                 ` Drew Adams
@ 2019-03-21 14:17                   ` Eli Zaretskii
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2019-03-21 14:17 UTC (permalink / raw)
  To: Drew Adams; +Cc: eggert, emacs-devel

> Date: Wed, 20 Mar 2019 21:21:38 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: drew.adams@oracle.com, emacs-devel@gnu.org
> 
> The problem with an empty "should" is that it's
> vague, and it can be misunderstood as a (possibly
> shame-faced) "must".  (Shame-faced ~ weasel.)

That is true for documents that describe requirements of some sort.
That's not what our documentation does.

> But "should", "recommend", and "must" all typically
> beg the question of what happens if you don't.

That's an orthogonal issue, it exists regardless of the word you
choose.

> With "should", in particular, it's also typically
> unclear how important it is that you do what's
> suggested.

I disagree, but I do think that if the text begs such a question, it
should also answer it.  Again, this is regardless of which word is
used.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-20  2:20       ` Stefan Monnier
  2019-03-20 22:01         ` Paul Eggert
  2019-03-21  2:07         ` Richard Stallman
@ 2019-03-22 13:26         ` Stephen Leake
  2019-03-22 14:03           ` Stefan Monnier
  2019-03-22 14:12           ` Mattias Engdegård
  2 siblings, 2 replies; 31+ messages in thread
From: Stephen Leake @ 2019-03-22 13:26 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I wonder why the doc doesn't just say that `-` should be the last
> character and not mention the other possibilities which just make the
> rule unnecessarily complex.

I recently ran into a problem with this; I had a regexp with `-' at the
end, then edited it to add another range, without noticing the trailing
`-':

before: [0-9a-zA-Z_-]
after: [0-9a-zA-Z_-\x80-\U0010FFFF]

This is _not_ what I wanted.

So putting it first is better. It does not conflict with putting `]'
first;

[]-...]

is unambiguous.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-22 13:26         ` Stephen Leake
@ 2019-03-22 14:03           ` Stefan Monnier
  2019-03-22 14:12           ` Mattias Engdegård
  1 sibling, 0 replies; 31+ messages in thread
From: Stefan Monnier @ 2019-03-22 14:03 UTC (permalink / raw)
  To: emacs-devel

> So putting it first is better. It does not conflict with putting `]' first;

It does.

> []-...]

It's not first: it's second!

And of course, if you go and add another char:

    []!-...]

then it will fail as well, so having it "first" doesn't really protect it
from the problem you encountered when it was last (tho admittedly, it
does if you add a range rather than a single char).

Anyway, for that shed I vote for green with pink dots,


        Stefan




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-22 13:26         ` Stephen Leake
  2019-03-22 14:03           ` Stefan Monnier
@ 2019-03-22 14:12           ` Mattias Engdegård
  1 sibling, 0 replies; 31+ messages in thread
From: Mattias Engdegård @ 2019-03-22 14:12 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

22 mars 2019 kl. 14.26 skrev Stephen Leake <stephen_leake@stephe-leake.org>:
> 
> I recently ran into a problem with this; I had a regexp with `-' at the
> end, then edited it to add another range, without noticing the trailing
> `-':
> 
> before: [0-9a-zA-Z_-]
> after: [0-9a-zA-Z_-\x80-\U0010FFFF]

(Beside the point really, but [\x80-\U0010FFFF] is an empty set since that \x80 is a raw byte with code 0x3fff80.
Presumably you meant \u0080-\U0010FFFF.)

> So putting it first is better. It does not conflict with putting `]'
> first;
> 
> []-...]
> 
> is unambiguous.

[]-x] means the range ] to x, not the set of ], - and x.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-03-21 11:15           ` Mattias Engdegård
@ 2019-04-02  7:33             ` Paul Eggert
  2019-04-02 14:15               ` Mattias Engdegård
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Eggert @ 2019-04-02  7:33 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2902 bytes --]

Mattias Engdegård wrote:
 >
> don't we also need a precise description of exactly how they are interpreted by the engine?

In other parts of Emacs, we are typically OK with specs that don't completely 
specify behavior. This gives us more freedom to make changes in the undocumented 
behavior later. I think it makes sense to do that here too, for regular 
expressions like "[z-a-m]" that most readers would find confusing.

> I'm with Stefan here; `-' should go last. Anything else is a gritty detail.

Stefan already changed the doc in master to say that. The attached patch 
tightens up the wording (and still says that "-" should go last).

> Documenting differences from POSIX regexps is useful. Do you prefer having those differences being spread out, or all concentrated into one section?

I don't have a strong preference. I wrote it concentrated originally, and that 
form seems to work well.

> These days, a user may be more familiar with the various PCRE dialects than traditional or extended POSIX. Should that be taken into account?

It might be helpful. However, PCRE is further away from Emacs regexps than POSIX 
is, and a comparison of PCRE and POSIX regexps is probably best put into a 
different section. It's not a section I'd like to write, to be honest; PCRE is 
pretty hairy.

> The terminology is a bit confusing. Is 'raw 8-bit byte' included in 'unibyte'? Is \x7f ever a raw 8-bit byte?
> I agree that [å-\xff], say, should be invalid but I've never seen such constructs.

After looking into it I realized that I don't really know the semantics here 
(the text I recently added there seems to be wrong, in some cases), and I have 
my doubts that anyone else knows the semantics either. The attached patch simply 
gets rid of that section, leaving the area undocumented. User beware!

> It already does, and some bugs were found that way. As a special case, it no longer complains about z-a because that is unlikely to be an accident and occurs in some code on purpose.

OK, then we should document z-a as the preferred syntax (best go with the 
flow...). Done in the attached patch.

> As an experiment, I added detection of 'chained' ranges like [a-m-z] to xr and found a handful in both Emacs and GNU ELPA, but none of them carried a freeload of bugs. Keeping that check didn't seem worthwhile; the regexps may be a bit odd-looking, but aren't wrong.

It depends on what one means by "wrong". If one wants to use the ranges in both 
Emacs and grep they are "wrong", so it's reasonable for the manual to recommend 
against them.
> a rule finding [X-Y] where Y=X+1 found one or two questionable cases in a sea of false positives (also in the attachment).

It might also help for the trawler to warn about [X-Z] where Z = X+2. [XYZ] is 
clearer and less error-prone than [X-Z]. I shoehorned that into the attached 
patch too.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-More-regexp-advice-and-clarifications.patch --]
[-- Type: text/x-patch; name="0001-More-regexp-advice-and-clarifications.patch", Size: 4951 bytes --]

From 076ed98ff6d7debff3929beab048c8a90e48dbb8 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 2 Apr 2019 00:17:37 -0700
Subject: [PATCH] More regexp advice and clarifications
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* doc/lispref/searching.texi (Regexp Special): Simplify style
advice for order of ], ^, and - in character alternatives.
Stick with saying that it’s not a good idea to put ‘-’ after a
range.  Remove the special case about raw 8-bit bytes and
unibyte characters, as this documentation is confusing and
seems to be incorrect in some cases.  Say that z-a is the
preferred style for reversed ranges, since it’s clearer and is
typically what’s used in practice.  Mention some bad styles:
duplicates in character alternatives, ranges that denote <=3
characters, and ‘-’ as the first character.
---
 doc/lispref/searching.texi | 52 +++++++++++++++++++++++---------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 748ab586af..72ee9233a3 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -398,17 +398,11 @@ Regexp Special
 The usual regexp special characters are not special inside a
 character alternative.  A completely different set of characters is
 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
-
-To include a @samp{]} in a character alternative, you must make it the first
-character.  For example, @samp{[]a]} matches @samp{]} or @samp{a}.  To include
-a @samp{-}, write @samp{-} as the last character of the character alternative,
-tho you can also put it first or after a range.  Thus, @samp{[]-]} matches both
-@samp{]} and @samp{-}.  (As explained below, you cannot use @samp{\]} to
-include a @samp{]} inside a character alternative, since @samp{\} is not
-special there.)
-
-To include @samp{^} in a character alternative, put it anywhere but at
-the beginning.
+To include @samp{]} in a character alternative, put it at the
+beginning.  To include @samp{^}, put it anywhere but at the beginning.
+To include @samp{-}, put it at the end.  Thus, @samp{[]^-]} matches
+all three of these special characters.  You cannot use @samp{\} to
+escape these three characters, since @samp{\} is not special here.
 
 The following aspects of ranges are specific to Emacs, in that POSIX
 allows but does not require this behavior and programs other than
@@ -426,17 +420,33 @@ Regexp Special
 outside the C or POSIX locale.
 
 @item
-As a special case, if either bound of a range is a raw 8-bit byte, the
-other bound should be a unibyte character, and the range matches only
-unibyte characters.
+If the lower bound of a range is greater than its upper bound, the
+range is empty and represents no characters.  Thus, @samp{[z-a]}
+always fails to match, and @samp{[^z-a]} matches any character,
+including newline.  However, a reversed range should always be from
+the letter @samp{z} to the letter @samp{a} to make it clear that it is
+not a typo; for example, @samp{[+-*/]} should be avoided, because it
+matches only @samp{/} rather than the likely-intended four characters.
+@end enumerate
+
+Some kinds of character alternatives are not the best style even
+though they are standardized by POSIX and are portable.  They include:
 
+@enumerate
 @item
-If the lower bound of a range is greater than its upper bound, the
-range is empty and represents no characters.  Thus, @samp{[b-a]}
-always fails to match, and @samp{[^b-a]} matches any character,
-including newline.  However, the lower bound should be at most one
-greater than the upper bound; for example, @samp{[c-a]} should be
-avoided.
+A character alternative can include duplicates.  For example,
+@samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}.
+
+@item
+A range can denote just one, two, or three characters.  For example,
+@samp{[(-(]} is less clear than @samp{[(]}, @samp{[*-+]} is less clear
+than @samp{[*+]}, and @samp{[*-,]} is less clear than @samp{[*+,]}.
+
+@item
+A @samp{-} also appear at the beginning of a character alternative, or
+as the upper bound of a range.  For example, although @samp{[-a-z]} is
+valid, @samp{[a-z-]} is better style; and although @samp{[!--/]} is
+valid, @samp{[!-,/-]} is clearer.
 @end enumerate
 
 A character alternative can also specify named character classes
@@ -452,7 +462,7 @@ Regexp Special
 @cindex @samp{^} in regexp
 @samp{[^} begins a @dfn{complemented character alternative}.  This
 matches any character except the ones specified.  Thus,
-@samp{[^a-z0-9A-Z]} matches all characters @emph{except} letters and
+@samp{[^a-z0-9A-Z]} matches all characters @emph{except} ASCII letters and
 digits.
 
 @samp{^} is not special in a character alternative unless it is the first
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-02  7:33             ` Paul Eggert
@ 2019-04-02 14:15               ` Mattias Engdegård
  2019-04-02 14:26                 ` Noam Postavsky
                                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Mattias Engdegård @ 2019-04-02 14:15 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3831 bytes --]

2 apr. 2019 kl. 09.33 skrev Paul Eggert <eggert@cs.ucla.edu>:
> 
>> don't we also need a precise description of exactly how they are interpreted by the engine?
> 
> In other parts of Emacs, we are typically OK with specs that don't completely specify behavior. This gives us more freedom to make changes in the undocumented behavior later. I think it makes sense to do that here too, for regular expressions like "[z-a-m]" that most readers would find confusing.

Then where does a user go to understand extant regexps? (Do we have any latitude at all for changing even obscure corners of regexp syntax and semantics today?) That's why I favour expounding on the details in a separate section.

>> The terminology is a bit confusing. Is 'raw 8-bit byte' included in 'unibyte'? Is \x7f ever a raw 8-bit byte?
>> I agree that [å-\xff], say, should be invalid but I've never seen such constructs.
> 
> After looking into it I realized that I don't really know the semantics here (the text I recently added there seems to be wrong, in some cases), and I have my doubts that anyone else knows the semantics either. The attached patch simply gets rid of that section, leaving the area undocumented. User beware!

Apparently I don't really know it either -- I just discovered that:

(string-match "\xff"     "\xff")  => 0
(string-match "[\xff]"   "\xff")  => 0
(string-match "\xffé?"   "\xff")  => nil
(string-match "[\xff]é?" "\xff")  => 0
(string-match "\xff"     "\xffé") => 0
(string-match "[\xff]"   "\xffé") => nil
(string-match "\xffé?"   "\xffé") => 0
(string-match "[\xff]é?" "\xffé") => nil

> OK, then we should document z-a as the preferred syntax (best go with the flow...). Done in the attached patch.

Actually, the only place where I saw z-a was in auctex (in negated form, [^z-a]).

>> As an experiment, I added detection of 'chained' ranges like [a-m-z] to xr and found a handful in both Emacs and GNU ELPA, but none of them carried a freeload of bugs. Keeping that check didn't seem worthwhile; the regexps may be a bit odd-looking, but aren't wrong.
> 
> It depends on what one means by "wrong". If one wants to use the ranges in both Emacs and grep they are "wrong", so it's reasonable for the manual to recommend against them.

Definitely agree that it should be discouraged. I've attached the ones found by a modified relint/xr, in case you are interested.

> It might also help for the trawler to warn about [X-Z] where Z = X+2. [XYZ] is clearer and less error-prone than [X-Z]. I shoehorned that into the attached patch too.

These seem to be rare; I found exactly one occurrence (lisp/gnus/message.el:1291):

 "[ \t]\\|[][!\"#$%&'()*+,-./0-9;<=>?@A-Z\\^_`a-z{|}~]+:"

which uses the punny range ,-. (possibly by benign accident).
Similarly, singleton ranges, X-X, are non-existent save for --- which I presume is an XEmacs workaround.

The latest xr version warns about 2-character ranges, except within digits because [0-1] etc was found to be common and harmless.

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 748ab586af..72ee9233a3 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
...
+A character alternative can include duplicates.  For example,
+@samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}.

Certainly, but does this need to be mentioned? Overlapping ranges are rarely written on purpose. Besides, duplication isn't confined to ranges.

More useful, I think, would be to recommend ranges to stay within natural sequences (letters, digits, etc) so that a reader needn't consult a table to see what is included. Thus [0-9.:/] good, [.-:] bad, even though they denote the same set.

+@item
+A @samp{-} also appear at the beginning of a character alternative, or

'appears'


[-- Attachment #2: chained-ranges.log --]
[-- Type: application/octet-stream, Size: 2310 bytes --]

;; -*- compilation -*-
lisp/gnus/nndoc.el:704:30: In call to re-search-forward: Attempt at chained ranges `A-Z-\' (pos 47)
  "^\\\\\\\\\n\\(Paper\\( (\\*cross-listing\\*)\\)?: [a-zA-Z-\\.]+/[0-9]+\\|arXiv:\\)"
   .........................................................^
lisp/gnus/nndoc.el:735:27: In call to looking-at: Attempt at chained ranges `A-Z-\' (pos 34)
  "^\\(Paper.*: \\|arXiv:\\)\\([0-9a-zA-Z-\\./]+\\)"
   ......................................^
lisp/org/org-eshell.el:40:29: In call to string-match: Attempt at chained ranges `0-9-+' (pos 12)
  "\\([A-Za-z0-9-+*]+\\):\\(.*\\)"
   .............^
lisp/org/org.el:432:3: In org-deadline-time-hour-regexp: Attempt at chained ranges `0-9-+' (pos 48)
  "\\<DEADLINE: *<\\([^>]+[0-9]\\{1,2\\}:[0-9]\\{2\\}[0-9-+:hdwmy \t.]*\\)>"
   ......................................................^
lisp/org/org.el:448:3: In org-scheduled-time-hour-regexp: Attempt at chained ranges `0-9-+' (pos 49)
  "\\<SCHEDULED: *<\\([^>]+[0-9]\\{1,2\\}:[0-9]\\{2\\}[0-9-+:hdwmy \t.]*\\)>"
   .......................................................^
lisp/progmodes/bat-mode.el:68:3: In bat-font-lock-keywords: Attempt at chained ranges `0-9-_' (pos 39)
  "\\_<\\(call\\|goto\\)\\_>[ \t]+%?\\([A-Za-z0-9-_\\:.]+\\)%?"
   ..............................................^
lisp/progmodes/bug-reference.el:72:3: In bug-reference-bug-regexp: Attempt at chained ranges `a-z-+' (pos 42)
  "\\([Bb]ug ?#?\\|[Pp]atch ?#\\|RFE ?#\\|PR [a-z-+]+/\\)\\([0-9]+\\(?:#[0-9]+\\)?\\)"
   ..............................................^
lisp/textmodes/less-css-mode.el:196:3: In less-css-font-lock-keywords: Attempt at chained ranges `a-z-_' (pos 12)
  "@[a-z_-][a-z-_0-9]*"
   ............^
lisp/textmodes/less-css-mode.el:196:3: In less-css-font-lock-keywords: Attempt at chained ranges `a-z-_' (pos 30)
  "\\(?:[ \t{;]\\|^\\)\\(\\.[a-z_-][a-z-_0-9]*\\)[ \t]*;"
   ....................................^
lisp/vc/vc-cvs.el:1090:27: In call to string-match: Attempt at chained ranges `A-Z-_' (pos 11)
  "[^a-z0-9A-Z-_]"
   ...........^
lisp/vc/vc-svn.el:762:27: In call to string-match: Attempt at chained ranges `A-Z-_' (pos 11)
  "[^a-z0-9A-Z-_]"
   ...........^
lisp/files.el:6319:28: In call to string-match: Attempt at chained ranges `0-9-_' (pos 11)
  "[^A-Za-z0-9-_.~#+]"
   ...........^

[-- Attachment #3: Type: text/plain, Size: 1 bytes --]



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-02 14:15               ` Mattias Engdegård
@ 2019-04-02 14:26                 ` Noam Postavsky
  2019-04-02 19:13                   ` Mattias Engdegård
  2019-04-02 16:58                 ` Stefan Monnier
  2019-04-02 22:08                 ` Paul Eggert
  2 siblings, 1 reply; 31+ messages in thread
From: Noam Postavsky @ 2019-04-02 14:26 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Paul Eggert, Stefan Monnier, Emacs developers

On Tue, 2 Apr 2019 at 10:15, Mattias Engdegård <mattiase@acm.org> wrote:

> > After looking into it I realized that I don't really know the semantics here (the text I recently added there seems to be wrong, in some cases), and I have my doubts that anyone else knows the semantics either. The attached patch simply gets rid of that section, leaving the area undocumented. User beware!
>
> Apparently I don't really know it either -- I just discovered that:
>
> (string-match "\xff"     "\xff")  => 0
> (string-match "[\xff]"   "\xff")  => 0
> (string-match "\xffé?"   "\xff")  => nil
> (string-match "[\xff]é?" "\xff")  => 0
> (string-match "\xff"     "\xffé") => 0
> (string-match "[\xff]"   "\xffé") => nil
> (string-match "\xffé?"   "\xffé") => 0
> (string-match "[\xff]é?" "\xffé") => nil

See also Bug#3687.

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3687



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-02 14:15               ` Mattias Engdegård
  2019-04-02 14:26                 ` Noam Postavsky
@ 2019-04-02 16:58                 ` Stefan Monnier
  2019-04-02 22:08                 ` Paul Eggert
  2 siblings, 0 replies; 31+ messages in thread
From: Stefan Monnier @ 2019-04-02 16:58 UTC (permalink / raw)
  To: emacs-devel

> (string-match "\xff"     "\xff")  => 0
> (string-match "[\xff]"   "\xff")  => 0
> (string-match "\xffé?"   "\xff")  => nil
> (string-match "[\xff]é?" "\xff")  => 0
> (string-match "\xff"     "\xffé") => 0
> (string-match "[\xff]"   "\xffé") => nil
> (string-match "\xffé?"   "\xffé") => 0
> (string-match "[\xff]é?" "\xffé") => nil

Check (multibyte-string-p "...") on those strings, to see some of the
reasons why.  IIRC the treatment of those escape sequences to determine
unibyte/multibyte strings is pretty tricky (last time I looked at it,
I found its behavior to be undesirable, but I believe it has slightly
changed since and I can't remember what were the problems I bumped
into).


        Stefan




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-02 14:26                 ` Noam Postavsky
@ 2019-04-02 19:13                   ` Mattias Engdegård
  0 siblings, 0 replies; 31+ messages in thread
From: Mattias Engdegård @ 2019-04-02 19:13 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Paul Eggert, Stefan Monnier, Emacs developers

2 apr. 2019 kl. 16.26 skrev Noam Postavsky <npostavs@gmail.com>:
> 
> See also Bug#3687.
> 
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3687

Thank you, that's precisely the same problem. Your memory is impressive.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-02 14:15               ` Mattias Engdegård
  2019-04-02 14:26                 ` Noam Postavsky
  2019-04-02 16:58                 ` Stefan Monnier
@ 2019-04-02 22:08                 ` Paul Eggert
  2019-04-03  4:52                   ` Eli Zaretskii
  2019-04-06  9:43                   ` Mattias Engdegård
  2 siblings, 2 replies; 31+ messages in thread
From: Paul Eggert @ 2019-04-02 22:08 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1879 bytes --]

On 4/2/19 7:15 AM, Mattias Engdegård wrote:
> where does a user go to understand extant regexps?

A user that *really* wants to know can go read the source code and get
confused, just like I did. :-)

But I think it's better if the documentation doesn't say what happens.
If you prefer that the documentation explicitly say that it doesn't say
what happens, I guess that would be OK too (what sort of wording would
you like, though?).

> (Do we have any latitude at all for changing even obscure corners of
> regexp syntax and semantics today?)

I would say so, certainly for the raw 8-bit-bytes in ranges stuff (where
nobody knows what they mean or even should mean), and possibly even for
some of the other rarely-used and questionable uses.


>
> I've attached the ones found by a modified relint/xr, in case you are interested.

Sure! Fixed in the attached patch.


>
> +A character alternative can include duplicates.  For example,
> +@samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}.
>
> Certainly, but does this need to be mentioned? Overlapping ranges are rarely written on purpose. Besides, duplication isn't confined to ranges.

That example does contains non-range duplicates. I think duplicates are
worth mentioning (if only so that your trawler can point to the style
advice if people complain about the trawler being too picky :-).


> More useful, I think, would be to recommend ranges to stay within natural sequences (letters, digits, etc) so that a reader needn't consult a table to see what is included. Thus [0-9.:/] good, [.-:] bad, even though they denote the same set.
Good idea. I did that in the attached patch, which I just installed into
master and I hope addresses the points you raised. I hope that the Thai
example doesn't mess things up (I considered doing Arabic, which would
have been more fun :-).

[-- Attachment #2: 0001-Improve-regexp-advice-again-and-unchain-ranges.txt --]
[-- Type: text/plain, Size: 12591 bytes --]

From 4d6036021c36cd5f6f2d8beedf9e52298abc8d3a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 2 Apr 2019 15:00:59 -0700
Subject: [PATCH] Improve regexp advice again, and unchain ranges
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* doc/lispref/searching.texi (Regexp Special):
Mention char classes earlier, in a more-logical place.
Advise sticking to ASCII letters and digits in ranges.
Reword negative advice to make it clearer that it’s negative.
* lisp/files.el (make-auto-save-file-name):
* lisp/gnus/message.el (message-mailer-swallows-blank-line):
* lisp/gnus/nndoc.el (nndoc-lanl-gov-announce-type-p)
(nndoc-generate-lanl-gov-head):
* lisp/org/org-eshell.el (org-eshell-open):
* lisp/org/org.el (org-deadline-time-hour-regexp)
(org-scheduled-time-hour-regexp):
* lisp/progmodes/bat-mode.el (bat-font-lock-keywords):
* lisp/progmodes/bug-reference.el (bug-reference-bug-regexp):
* lisp/textmodes/less-css-mode.el (less-css-font-lock-keywords):
* lisp/vc/vc-cvs.el (vc-cvs-valid-symbolic-tag-name-p):
* lisp/vc/vc-svn.el (vc-svn-valid-symbolic-tag-name-p):
Avoid attempts to chain ranges, as this can be confusing.
For example, instead of [0-9-_.], use [0-9_.-].
---
 doc/lispref/searching.texi      | 52 ++++++++++++++++++++-------------
 lisp/files.el                   |  2 +-
 lisp/gnus/message.el            |  2 +-
 lisp/gnus/nndoc.el              |  4 +--
 lisp/org/org-eshell.el          |  2 +-
 lisp/org/org.el                 |  4 +--
 lisp/progmodes/bat-mode.el      |  2 +-
 lisp/progmodes/bug-reference.el |  2 +-
 lisp/textmodes/less-css-mode.el |  4 +--
 lisp/vc/vc-cvs.el               |  2 +-
 lisp/vc/vc-svn.el               |  2 +-
 11 files changed, 45 insertions(+), 33 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 72ee9233a3..8775254dd0 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -395,9 +395,18 @@ Regexp Special
 range should not be the starting point of another one; for example,
 @samp{[a-m-z]} should be avoided.
 
+A character alternative can also specify named character classes
+(@pxref{Char Classes}).  This is a POSIX feature.  For example,
+@samp{[[:ascii:]]} matches any @acronym{ASCII} character.
+Using a character class is equivalent to mentioning each of the
+characters in that class; but the latter is not feasible in practice,
+since some classes include thousands of different characters.
+A character class should not appear as the lower or upper bound
+of a range.
+
 The usual regexp special characters are not special inside a
 character alternative.  A completely different set of characters is
-special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
+special: @samp{]}, @samp{-} and @samp{^}.
 To include @samp{]} in a character alternative, put it at the
 beginning.  To include @samp{^}, put it anywhere but at the beginning.
 To include @samp{-}, put it at the end.  Thus, @samp{[]^-]} matches
@@ -430,33 +439,36 @@ Regexp Special
 @end enumerate
 
 Some kinds of character alternatives are not the best style even
-though they are standardized by POSIX and are portable.  They include:
+though they have a well-defined meaning in Emacs.  They include:
 
 @enumerate
 @item
-A character alternative can include duplicates.  For example,
-@samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}.
+Although a range's bound can be almost any character, it is better
+style to stay within natural sequences of ASCII letters and digits
+because most people have not memorized character code tables.
+For example, @samp{[.-9]} is less clear than @samp{[./0-9]},
+and @samp{[`-~]} is less clear than @samp{[`a-z@{|@}~]}.
+Unicode character escapes can help here; for example, for most programmers
+@samp{[ก-ฺ฿-๛]} is less clear than @samp{[\u0E01-\u0E3A\u0E3F-\u0E5B]}.
 
 @item
-A range can denote just one, two, or three characters.  For example,
-@samp{[(-(]} is less clear than @samp{[(]}, @samp{[*-+]} is less clear
-than @samp{[*+]}, and @samp{[*-,]} is less clear than @samp{[*+,]}.
+Although a character alternative can include duplicates, it is better
+style to avoid them.  For example, @samp{[XYa-yYb-zX]} is less clear
+than @samp{[XYa-z]}.
 
 @item
-A @samp{-} also appear at the beginning of a character alternative, or
-as the upper bound of a range.  For example, although @samp{[-a-z]} is
-valid, @samp{[a-z-]} is better style; and although @samp{[!--/]} is
-valid, @samp{[!-,/-]} is clearer.
-@end enumerate
+Although a range can denote just one, two, or three characters, it
+is simpler to list the characters.  For example,
+@samp{[a-a0]} is less clear than @samp{[a0]}, @samp{[i-j]} is less clear
+than @samp{[ij]}, and @samp{[i-k]} is less clear than @samp{[ijk]}.
 
-A character alternative can also specify named character classes
-(@pxref{Char Classes}).  This is a POSIX feature.  For example,
-@samp{[[:ascii:]]} matches any @acronym{ASCII} character.
-Using a character class is equivalent to mentioning each of the
-characters in that class; but the latter is not feasible in practice,
-since some classes include thousands of different characters.
-A character class should not appear as the lower or upper bound
-of a range.
+@item
+Although a @samp{-} can appear at the beginning of a character
+alternative or as the upper bound of a range, it is better style to
+put @samp{-} by itself at the end of a character alternative.  For
+example, although @samp{[-a-z]} is valid, @samp{[a-z-]} is better
+style; and although @samp{[*--]} is valid, @samp{[*+,-]} is clearer.
+@end enumerate
 
 @item @samp{[^ @dots{} ]}
 @cindex @samp{^} in regexp
diff --git a/lisp/files.el b/lisp/files.el
index 77a194b085..1dae57593a 100644
--- a/lisp/files.el
+++ b/lisp/files.el
@@ -6316,7 +6316,7 @@ make-auto-save-file-name
       ;; We do this on all platforms, because even if we are not
       ;; running on DOS/Windows, the current directory may be on a
       ;; mounted VFAT filesystem, such as a USB memory stick.
-      (while (string-match "[^A-Za-z0-9-_.~#+]" buffer-name limit)
+      (while (string-match "[^A-Za-z0-9_.~#+-]" buffer-name limit)
 	(let* ((character (aref buffer-name (match-beginning 0)))
 	       (replacement
                 ;; For multibyte characters, this will produce more than
diff --git a/lisp/gnus/message.el b/lisp/gnus/message.el
index dae4b0dced..c8b6f0ee68 100644
--- a/lisp/gnus/message.el
+++ b/lisp/gnus/message.el
@@ -1288,7 +1288,7 @@ message-mailer-swallows-blank-line
       ;; According to RFC 822 and its successors, the field name must
       ;; consist of printable US-ASCII characters other than colon,
       ;; i.e., decimal 33-56 and 59-126.
-      '(looking-at "[ \t]\\|[][!\"#$%&'()*+,-./0-9;<=>?@A-Z\\^_`a-z{|}~]+:"))
+      '(looking-at "[ \t]\\|[][!\"#$%&'()*+,./0-9;<=>?@A-Z\\^_`a-z{|}~-]+:"))
   "Set this non-nil if the system's mailer runs the header and body together.
 \(This problem exists on Sunos 4 when sendmail is run in remote mode.)
 The value should be an expression to test whether the problem will
diff --git a/lisp/gnus/nndoc.el b/lisp/gnus/nndoc.el
index 8f1217b127..532ba11fa0 100644
--- a/lisp/gnus/nndoc.el
+++ b/lisp/gnus/nndoc.el
@@ -701,7 +701,7 @@ nndoc-transform-git-headers
 
 (defun nndoc-lanl-gov-announce-type-p ()
   (when (let ((case-fold-search nil))
-	  (re-search-forward "^\\\\\\\\\n\\(Paper\\( (\\*cross-listing\\*)\\)?: [a-zA-Z-\\.]+/[0-9]+\\|arXiv:\\)" nil t))
+	  (re-search-forward "^\\\\\\\\\n\\(Paper\\( (\\*cross-listing\\*)\\)?: [a-zA-Z\\.-]+/[0-9]+\\|arXiv:\\)" nil t))
     t))
 
 (defun nndoc-transform-lanl-gov-announce (article)
@@ -732,7 +732,7 @@ nndoc-generate-lanl-gov-head
       (save-restriction
 	(narrow-to-region (car entry) (nth 1 entry))
 	(goto-char (point-min))
-	(when (looking-at "^\\(Paper.*: \\|arXiv:\\)\\([0-9a-zA-Z-\\./]+\\)")
+	(when (looking-at "^\\(Paper.*: \\|arXiv:\\)\\([0-9a-zA-Z\\./-]+\\)")
 	  (setq subject (concat " (" (match-string 2) ")"))
 	  (when (re-search-forward "^From: \\(.*\\)" nil t)
 	    (setq from (concat "<"
diff --git a/lisp/org/org-eshell.el b/lisp/org/org-eshell.el
index bb27d92e12..2251a1b892 100644
--- a/lisp/org/org-eshell.el
+++ b/lisp/org/org-eshell.el
@@ -37,7 +37,7 @@ org-eshell-open
    eshell buffer) or a command line prefixed by a buffer name
    followed by a colon."
   (let* ((buffer-and-command
-          (if (string-match "\\([A-Za-z0-9-+*]+\\):\\(.*\\)" link)
+          (if (string-match "\\([A-Za-z0-9+*-]+\\):\\(.*\\)" link)
 	      (list (match-string 1 link)
 		    (match-string 2 link))
             (list eshell-buffer-name link)))
diff --git a/lisp/org/org.el b/lisp/org/org.el
index bf7e305b7a..ce6dd24a83 100644
--- a/lisp/org/org.el
+++ b/lisp/org/org.el
@@ -430,7 +430,7 @@ org-deadline-time-regexp
 
 (defconst org-deadline-time-hour-regexp
   (concat "\\<" org-deadline-string
-	  " *<\\([^>]+[0-9]\\{1,2\\}:[0-9]\\{2\\}[0-9-+:hdwmy \t.]*\\)>")
+	  " *<\\([^>]+[0-9]\\{1,2\\}:[0-9]\\{2\\}[0-9+:hdwmy \t.-]*\\)>")
   "Matches the DEADLINE keyword together with a time-and-hour stamp.")
 
 (defconst org-deadline-line-regexp
@@ -446,7 +446,7 @@ org-scheduled-time-regexp
 
 (defconst org-scheduled-time-hour-regexp
   (concat "\\<" org-scheduled-string
-	  " *<\\([^>]+[0-9]\\{1,2\\}:[0-9]\\{2\\}[0-9-+:hdwmy \t.]*\\)>")
+	  " *<\\([^>]+[0-9]\\{1,2\\}:[0-9]\\{2\\}[0-9+:hdwmy \t.-]*\\)>")
   "Matches the SCHEDULED keyword together with a time-and-hour stamp.")
 
 (defconst org-closed-time-regexp
diff --git a/lisp/progmodes/bat-mode.el b/lisp/progmodes/bat-mode.el
index 6c85ff9905..a8b002be59 100644
--- a/lisp/progmodes/bat-mode.el
+++ b/lisp/progmodes/bat-mode.el
@@ -78,7 +78,7 @@ bat-font-lock-keywords
              "goto" "gtr" "if" "in" "leq" "lss" "neq" "not" "start"))
           (UNIX
            '("bash" "cat" "cp" "fgrep" "grep" "ls" "sed" "sh" "mv" "rm")))
-      `(("\\_<\\(call\\|goto\\)\\_>[ \t]+%?\\([A-Za-z0-9-_\\:.]+\\)%?"
+      `(("\\_<\\(call\\|goto\\)\\_>[ \t]+%?\\([A-Za-z0-9_\\:.-]+\\)%?"
          (2 font-lock-constant-face t))
         ("^:[^:].*"
          . 'bat-label-face)
diff --git a/lisp/progmodes/bug-reference.el b/lisp/progmodes/bug-reference.el
index 8baf74854f..759db1f568 100644
--- a/lisp/progmodes/bug-reference.el
+++ b/lisp/progmodes/bug-reference.el
@@ -69,7 +69,7 @@ bug-reference-url-format
                 (get s 'bug-reference-url-format)))))
 
 (defcustom bug-reference-bug-regexp
-  "\\([Bb]ug ?#?\\|[Pp]atch ?#\\|RFE ?#\\|PR [a-z-+]+/\\)\\([0-9]+\\(?:#[0-9]+\\)?\\)"
+  "\\([Bb]ug ?#?\\|[Pp]atch ?#\\|RFE ?#\\|PR [a-z+-]+/\\)\\([0-9]+\\(?:#[0-9]+\\)?\\)"
   "Regular expression matching bug references.
 The second subexpression should match the bug reference (usually a number)."
   :type 'string
diff --git a/lisp/textmodes/less-css-mode.el b/lisp/textmodes/less-css-mode.el
index b4c7f28985..4077789eb1 100644
--- a/lisp/textmodes/less-css-mode.el
+++ b/lisp/textmodes/less-css-mode.el
@@ -194,10 +194,10 @@ less-css-compile
 ;; - custom faces.
 (defconst less-css-font-lock-keywords
   '(;; Variables
-    ("@[a-z_-][a-z-_0-9]*" . font-lock-variable-name-face)
+    ("@[a-z_-][a-z_0-9-]*" . font-lock-variable-name-face)
     ("&" . font-lock-preprocessor-face)
     ;; Mixins
-    ("\\(?:[ \t{;]\\|^\\)\\(\\.[a-z_-][a-z-_0-9]*\\)[ \t]*;" .
+    ("\\(?:[ \t{;]\\|^\\)\\(\\.[a-z_-][a-z_0-9-]*\\)[ \t]*;" .
      (1 font-lock-keyword-face))))
 
 (defvar less-css-mode-syntax-table
diff --git a/lisp/vc/vc-cvs.el b/lisp/vc/vc-cvs.el
index 3bbd0ed49b..626e190c1e 100644
--- a/lisp/vc/vc-cvs.el
+++ b/lisp/vc/vc-cvs.el
@@ -1087,7 +1087,7 @@ vc-cvs-valid-symbolic-tag-name-p
   ;; an uppercase or lowercase letter and can contain uppercase and
   ;; lowercase letters, digits, `-', and `_'.
   (and (string-match "^[a-zA-Z]" tag)
-       (not (string-match "[^a-z0-9A-Z-_]" tag))))
+       (not (string-match "[^a-z0-9A-Z_-]" tag))))
 
 (defun vc-cvs-valid-revision-number-p (tag)
   "Return non-nil if TAG is a valid revision number."
diff --git a/lisp/vc/vc-svn.el b/lisp/vc/vc-svn.el
index 618f03eedc..3c50c8fff6 100644
--- a/lisp/vc/vc-svn.el
+++ b/lisp/vc/vc-svn.el
@@ -759,7 +759,7 @@ vc-svn-valid-symbolic-tag-name-p
   ;; an uppercase or lowercase letter and can contain uppercase and
   ;; lowercase letters, digits, `-', and `_'.
   (and (string-match "^[a-zA-Z]" tag)
-       (not (string-match "[^a-z0-9A-Z-_]" tag))))
+       (not (string-match "[^a-z0-9A-Z_-]" tag))))
 
 (defun vc-svn-valid-revision-number-p (tag)
   "Return non-nil if TAG is a valid revision number."
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-02 22:08                 ` Paul Eggert
@ 2019-04-03  4:52                   ` Eli Zaretskii
  2019-04-03 17:02                     ` Paul Eggert
  2019-04-06  9:43                   ` Mattias Engdegård
  1 sibling, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2019-04-03  4:52 UTC (permalink / raw)
  To: Paul Eggert; +Cc: mattiase, monnier, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 2 Apr 2019 15:08:43 -0700
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> > (Do we have any latitude at all for changing even obscure corners of
> > regexp syntax and semantics today?)
> 
> I would say so, certainly for the raw 8-bit-bytes in ranges stuff (where
> nobody knows what they mean or even should mean), and possibly even for
> some of the other rarely-used and questionable uses.

Beware: Emacs sometimes does use regexps when dealing with unibyte
buffers and strings, where these ranges could be significant.  So I'd
suggest a careful audit of such places, to see whether raw bytes
could be an issue, because the resulting breakage could be subtle and
not become apparent until much later.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-03  4:52                   ` Eli Zaretskii
@ 2019-04-03 17:02                     ` Paul Eggert
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Eggert @ 2019-04-03 17:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, monnier, emacs-devel

On 4/2/19 9:52 PM, Eli Zaretskii wrote:
> Emacs sometimes does use regexps when dealing with unibyte
> buffers and strings, where these ranges could be significant. 

Quite possibly we'd want to support the case where a range in a unibyte
pattern is used against a unibyte buffer or string, as the semantics are
straightforward there and the code should work (or at least I think it
should work; the implementation is not clear). The troublesome cases
with ranges and raw 8-bit bytes occur when either the pattern or the
buffer/string is multibyte - there it's not clear what the
implementation *should* do, much less what it does.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-02 22:08                 ` Paul Eggert
  2019-04-03  4:52                   ` Eli Zaretskii
@ 2019-04-06  9:43                   ` Mattias Engdegård
  2019-04-07  8:15                     ` Michael Albinus
  2019-04-07  9:47                     ` Paul Eggert
  1 sibling, 2 replies; 31+ messages in thread
From: Mattias Engdegård @ 2019-04-06  9:43 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Stefan Monnier, Emacs developers

[-- Attachment #1: Type: text/plain, Size: 955 bytes --]

3 apr. 2019 kl. 00.08 skrev Paul Eggert <eggert@cs.ucla.edu>:
> 
> On 4/2/19 7:15 AM, Mattias Engdegård wrote:
>> where does a user go to understand extant regexps?
> 
> A user that *really* wants to know can go read the source code and get
> confused, just like I did. :-)
> 
> But I think it's better if the documentation doesn't say what happens.
> If you prefer that the documentation explicitly say that it doesn't say
> what happens, I guess that would be OK too (what sort of wording would
> you like, though?).

I was mostly thinking about regexps like "^***" which do occur in actual code (by mistake), and are almost but not quite explained in the manual. But perhaps some knowledge is better hidden.

>> I've attached the ones found by a modified relint/xr, in case you are interested.
> 
> Sure! Fixed in the attached patch.

Thank you, it looks fine to me. Some tool improvements uncovered more oddities; log attached.


[-- Attachment #2: relint.log --]
[-- Type: application/octet-stream, Size: 1083 bytes --]

;; -*- compilation -*-
lisp/net/tramp.el:2811:30: `skip-regexp' cannot be used for arguments to `skip-chars-forward'
lisp/progmodes/sh-script.el:2909:37: In call to skip-chars-forward: Suspect skip set framed in `[...]' (pos 0)
  "[a-z0-9]*?"
   ^
lisp/progmodes/verilog-mode.el:14269:38: `sig-re' cannot be used for arguments to `skip-chars-backward'
lisp/progmodes/verilog-mode.el:14272:37: `sig-re' cannot be used for arguments to `skip-chars-forward'
lisp/vc/log-edit.el:353:10: In call to re-search-forward: Literal `-' not first or last in character alternative (pos 41)
  "^\\([^[:alpha:]]\\|[[:alnum:]-]+[^[:alnum:]-:]\\)"
   ...........................................^
lisp/xml.el:721:25: In call to re-search-forward: Literal `-' not first or last in character alternative (pos 24)
  "\\=\"\\([[:space:][:alnum:]-'()+,./:=?;!*#@$_%]*\\)\""
   ...........................^
lisp/xml.el:724:25: In call to re-search-forward: Literal `-' not first or last in character alternative (pos 24)
  "\\='\\([[:space:][:alnum:]-()+,./:=?;!*#@$_%]*\\)'"
   ..........................^

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-06  9:43                   ` Mattias Engdegård
@ 2019-04-07  8:15                     ` Michael Albinus
  2019-04-07  9:47                     ` Paul Eggert
  1 sibling, 0 replies; 31+ messages in thread
From: Michael Albinus @ 2019-04-07  8:15 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Paul Eggert, Stefan Monnier, Emacs developers

Mattias Engdegård <mattiase@acm.org> writes:

Hi Mattias,

> lisp/net/tramp.el:2811:30: `skip-regexp' cannot be used for arguments to `skip-chars-forward'

Thanks for the report, I've fixed this in the master.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-06  9:43                   ` Mattias Engdegård
  2019-04-07  8:15                     ` Michael Albinus
@ 2019-04-07  9:47                     ` Paul Eggert
  2019-04-07 10:06                       ` Mattias Engdegård
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Eggert @ 2019-04-07  9:47 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 185 bytes --]

Mattias Engdegård wrote:
> Some tool improvements uncovered more oddities; log attached.

Thanks, I installed the attached to fix the oddities that Michael hadn't already 
fixed.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-more-regexp-oddities.patch --]
[-- Type: text/x-patch; name="0001-Fix-more-regexp-oddities.patch", Size: 3265 bytes --]

From 659ab82fac9f8f2823b1729e85769627c77f140e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 7 Apr 2019 02:44:37 -0700
Subject: [PATCH] Fix more regexp oddities
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problems reported by Mattias Engdegård in:
https://lists.gnu.org/archive/html/emacs-devel/2019-04/msg00178.html
* lisp/progmodes/sh-script.el (sh-get-indent-info):
Reorder skip-chars-forward arg so that it does not look like a regexp.
* lisp/progmodes/verilog-mode.el (verilog-sk-define-signal):
Fix typo: the string is not a regexp.
* lisp/vc/log-edit.el (log-edit-goto-eoh): Fix typo: stray ‘:’.
* lisp/xml.el (xml-parse-dtd): Avoid ‘-’ right after char class.
---
 lisp/progmodes/sh-script.el    | 3 +--
 lisp/progmodes/verilog-mode.el | 2 +-
 lisp/vc/log-edit.el            | 2 +-
 lisp/xml.el                    | 4 ++--
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/lisp/progmodes/sh-script.el b/lisp/progmodes/sh-script.el
index dd3a6fa411..853a3500ee 100644
--- a/lisp/progmodes/sh-script.el
+++ b/lisp/progmodes/sh-script.el
@@ -2905,8 +2905,7 @@ sh-get-indent-info
 		      (setq align-point (point))))
 		(or (bobp)
 		    (forward-char -1))
-                ;; FIXME: This charset looks too much like a regexp.  --Stef
-		(skip-chars-forward "[a-z0-9]*?")
+		(skip-chars-forward "*0-9?[]a-z")
 		)
 	       ((string-match "[])}]" x)
 		(setq x (sh-safe-forward-sexp -1))
diff --git a/lisp/progmodes/verilog-mode.el b/lisp/progmodes/verilog-mode.el
index 7b9c3921fb..916594bdde 100644
--- a/lisp/progmodes/verilog-mode.el
+++ b/lisp/progmodes/verilog-mode.el
@@ -14263,7 +14263,7 @@ verilog-sk-def-reg
 (defun verilog-sk-define-signal ()
   "Insert a definition of signal under point at top of module."
   (interactive "*")
-  (let* ((sig-re "[a-zA-Z0-9_]*")
+  (let* ((sig-re "a-zA-Z0-9_")
 	 (v1 (buffer-substring
               (save-excursion
                 (skip-chars-backward sig-re)
diff --git a/lisp/vc/log-edit.el b/lisp/vc/log-edit.el
index 8bd1bbddb7..42710dd8dc 100644
--- a/lisp/vc/log-edit.el
+++ b/lisp/vc/log-edit.el
@@ -350,7 +350,7 @@ log-edit-match-to-eoh
 (defun log-edit-goto-eoh ()             ;FIXME: Almost rfc822-goto-eoh!
   (goto-char (point-min))
   (when (re-search-forward
-	 "^\\([^[:alpha:]]\\|[[:alnum:]-]+[^[:alnum:]-:]\\)" nil 'move)
+	 "^\\([^[:alpha:]]\\|[[:alnum:]-]+[^[:alnum:]-]\\)" nil 'move)
     (goto-char (match-beginning 0))))
 
 (defun log-edit--match-first-line (limit)
diff --git a/lisp/xml.el b/lisp/xml.el
index 2337952f06..b5b923f863 100644
--- a/lisp/xml.el
+++ b/lisp/xml.el
@@ -718,10 +718,10 @@ xml-parse-dtd
     (cond ((looking-at "PUBLIC\\s-+")
 	   (goto-char (match-end 0))
 	   (unless (or (re-search-forward
-			"\\=\"\\([[:space:][:alnum:]-'()+,./:=?;!*#@$_%]*\\)\""
+			"\\=\"\\([[:space:][:alnum:]'()+,./:=?;!*#@$_%-]*\\)\""
 			nil t)
 		       (re-search-forward
-			"\\='\\([[:space:][:alnum:]-()+,./:=?;!*#@$_%]*\\)'"
+			"\\='\\([[:space:][:alnum:]()+,./:=?;!*#@$_%-]*\\)'"
 			nil t))
 	     (error "XML: Missing Public ID"))
 	   (let ((pubid (match-string-no-properties 1)))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-07  9:47                     ` Paul Eggert
@ 2019-04-07 10:06                       ` Mattias Engdegård
  2019-04-07 18:45                         ` Paul Eggert
  0 siblings, 1 reply; 31+ messages in thread
From: Mattias Engdegård @ 2019-04-07 10:06 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Emacs developers

7 apr. 2019 kl. 11.47 skrev Paul Eggert <eggert@cs.ucla.edu>:
> 
> Thanks, I installed the attached to fix the oddities that Michael hadn't already fixed.

Looks fine, thank you! Minor detail:

-  (let* ((sig-re "[a-zA-Z0-9_]*")
+  (let* ((sig-re "a-zA-Z0-9_")

It isn't a regexp; rename to sig-chars maybe?
To silence the checker, if nothing else. It latched on to the name, on the grounds that even if the contents look all right now, someone will one day believe the name and put a regexp in it.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Scan of regexps in Emacs (March 17)
  2019-04-07 10:06                       ` Mattias Engdegård
@ 2019-04-07 18:45                         ` Paul Eggert
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Eggert @ 2019-04-07 18:45 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Emacs developers

Mattias Engdegård wrote:
> -  (let* ((sig-re "[a-zA-Z0-9_]*")
> +  (let* ((sig-re "a-zA-Z0-9_")
> 
> It isn't a regexp; rename to sig-chars maybe?

Sure, please feel free to do that.



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-04-07 18:45 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-17 13:50 Scan of regexps in Emacs (March 17) Mattias Engdegård
2019-03-19  1:21 ` Paul Eggert
2019-03-19 10:34   ` Mattias Engdegård
2019-03-20  1:53     ` Paul Eggert
2019-03-20  2:20       ` Stefan Monnier
2019-03-20 22:01         ` Paul Eggert
2019-03-20 22:59           ` Drew Adams
2019-03-20 23:10             ` Paul Eggert
2019-03-21  3:38               ` Eli Zaretskii
     [not found]             ` <<deeccd91-0f43-c329-6087-17435550b328@cs.ucla.edu>
     [not found]               ` <<83d0mk6go5.fsf@gnu.org>
2019-03-21  4:21                 ` Drew Adams
2019-03-21 14:17                   ` Eli Zaretskii
2019-03-21  0:57           ` Stefan Monnier
2019-03-21 11:15           ` Mattias Engdegård
2019-04-02  7:33             ` Paul Eggert
2019-04-02 14:15               ` Mattias Engdegård
2019-04-02 14:26                 ` Noam Postavsky
2019-04-02 19:13                   ` Mattias Engdegård
2019-04-02 16:58                 ` Stefan Monnier
2019-04-02 22:08                 ` Paul Eggert
2019-04-03  4:52                   ` Eli Zaretskii
2019-04-03 17:02                     ` Paul Eggert
2019-04-06  9:43                   ` Mattias Engdegård
2019-04-07  8:15                     ` Michael Albinus
2019-04-07  9:47                     ` Paul Eggert
2019-04-07 10:06                       ` Mattias Engdegård
2019-04-07 18:45                         ` Paul Eggert
2019-03-21  2:07         ` Richard Stallman
2019-03-22 13:26         ` Stephen Leake
2019-03-22 14:03           ` Stefan Monnier
2019-03-22 14:12           ` Mattias Engdegård
2019-03-20 10:04       ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).