Emacs regexp scan (Sep 29)

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Emacs regexp scan (Sep 29)
@ 2019-09-29 19:39 Mattias Engdegård
  2019-10-04 21:42 ` Paul Eggert
  0 siblings, 1 reply; 24+ messages in thread
From: Mattias Engdegård @ 2019-09-29 19:39 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 148 bytes --]

Time for another of these, after some scanner improvements.
It looks like a lot, but it's probably just a handful errors causing many warnings.


[-- Attachment #2: relint.log --]
[-- Type: application/octet-stream, Size: 137572 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-09-29 19:39 Emacs regexp scan (Sep 29) Mattias Engdegård
@ 2019-10-04 21:42 ` Paul Eggert
  2019-10-05  8:10   ` Eli Zaretskii
  2019-10-05 10:03   ` Mattias Engdegård
  0 siblings, 2 replies; 24+ messages in thread
From: Paul Eggert @ 2019-10-04 21:42 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 185 bytes --]

Thanks, I installed the attached patch, which I hope fixes all the bugs 
and style glitches uncovered by that scan, along with some other style 
glitches I noticed in the neighborhood.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-bugs-found-by-2019-09-29-regexp-scanner.patch --]
[-- Type: text/x-patch; name="0001-Fix-bugs-found-by-2019-09-29-regexp-scanner.patch", Size: 15105 bytes --]

From 57556fa21da431b47b2f3842ddb61273f5d39820 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 4 Oct 2019 14:38:22 -0700
Subject: [PATCH] Fix bugs found by 2019-09-29 regexp scanner
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problems reported by Mattias EngdegÃ¥rd in:
https://lists.gnu.org/r/emacs-devel/2019-09/threads.html
* lisp/calendar/iso8601.el (iso8601--year-match)
(iso8601--full-date-match, iso8601--without-day-match)
(iso8601--week-date-match, iso8601--ordinal-date-match)
(iso8601--zone-match):
* lisp/textmodes/rst.el (rst-re-alist-def):
Put â€˜-â€™ at the end of bracketed ranges, following the style
suggestion in the Elisp manual.
(iso8601--time-match): Use \([0-9]*\) instead of \([0-9]+\)?
to pacify the regexp scanner.
(iso8601-parse-time): Adjust accordingly.
* lisp/language/burmese.el (burmese-composable-pattern):
* lisp/language/indian.el (devanagari-composable-pattern)
(bengali-composable-pattern, gurmukhi-composable-pattern)
(gujarati-composable-pattern, oriya-composable-pattern)
(telugu-composable-pattern, kannada-composable-pattern)
(malayalam-composable-pattern):
Prefer [ab] to [a-b] when the characters differ by 1,
to pacify the regexp scanner.
* lisp/language/burmese.el (burmese-composable-pattern):
Fix missing-â€˜\uâ€™ typos.
* lisp/language/indian.el (gurmukhi-composable-pattern):
Fix missing-â€˜\â€™ typo.
* lisp/language/tibetan.el (tibetan-regexp):
Quote â€˜+â€™ in regexp to pacify the regexp scanner.  Simplify.
* lisp/textmodes/rst.el (rst-re-alist-def): Fix â€˜[]-'...]â€™
typo by putting the â€˜-â€™ at end of the bracketed expression.
---
 lisp/calendar/iso8601.el | 18 ++++++-------
 lisp/language/burmese.el |  4 +--
 lisp/language/indian.el  | 58 ++++++++++++++++++++--------------------
 lisp/language/tibetan.el | 23 +++++++---------
 lisp/textmodes/rst.el    |  8 +++---
 5 files changed, 54 insertions(+), 57 deletions(-)

diff --git a/lisp/calendar/iso8601.el b/lisp/calendar/iso8601.el
index 3ff91d910c..78a94d47be 100644
--- a/lisp/calendar/iso8601.el
+++ b/lisp/calendar/iso8601.el
@@ -62,17 +62,17 @@ iso8601--concat-regexps
              regexps "\\|"))
 
 (defconst iso8601--year-match
-  "\\([-+]\\)?\\([0-9][0-9][0-9][0-9]\\)")
+  "\\([+-]\\)?\\([0-9][0-9][0-9][0-9]\\)")
 (defconst iso8601--full-date-match
-  "\\([-+]\\)?\\([0-9][0-9][0-9][0-9]\\)-?\\([0-9][0-9]\\)-?\\([0-9][0-9]\\)")
+  "\\([+-]\\)?\\([0-9][0-9][0-9][0-9]\\)-?\\([0-9][0-9]\\)-?\\([0-9][0-9]\\)")
 (defconst iso8601--without-day-match
-  "\\([-+]\\)?\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]\\)")
+  "\\([+-]\\)?\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]\\)")
 (defconst iso8601--outdated-date-match
   "--\\([0-9][0-9]\\)-?\\([0-9][0-9]\\)")
 (defconst iso8601--week-date-match
-  "\\([-+]\\)?\\([0-9][0-9][0-9][0-9]\\)-?W\\([0-9][0-9]\\)-?\\([0-9]\\)?")
+  "\\([+-]\\)?\\([0-9][0-9][0-9][0-9]\\)-?W\\([0-9][0-9]\\)-?\\([0-9]\\)?")
 (defconst iso8601--ordinal-date-match
-  "\\([-+]\\)?\\([0-9][0-9][0-9][0-9]\\)-?\\([0-9][0-9][0-9]\\)")
+  "\\([+-]\\)?\\([0-9][0-9][0-9][0-9]\\)-?\\([0-9][0-9][0-9]\\)")
 (defconst iso8601--date-match
   (iso8601--concat-regexps
    (list iso8601--year-match
@@ -83,10 +83,10 @@ iso8601--date-match
          iso8601--ordinal-date-match)))
 
 (defconst iso8601--time-match
-  "\\([0-9][0-9]\\):?\\([0-9][0-9]\\)?:?\\([0-9][0-9]\\)?[.,]?\\([0-9]+\\)?")
+  "\\([0-9][0-9]\\):?\\([0-9][0-9]\\)?:?\\([0-9][0-9]\\)?[.,]?\\([0-9]*\\)")
 
 (defconst iso8601--zone-match
-  "\\(Z\\|\\([-+]\\)\\([0-9][0-9]\\):?\\([0-9][0-9]\\)?\\)")
+  "\\(Z\\|\\([+-]\\)\\([0-9][0-9]\\):?\\([0-9][0-9]\\)?\\)")
 
 (defconst iso8601--full-time-match
   (concat "\\(" (replace-regexp-in-string "(" "(?:" iso8601--time-match) "\\)"
@@ -142,7 +142,7 @@ iso8601-parse
 (defun iso8601-parse-date (string)
   "Parse STRING (in ISO 8601 format) and return a `decode-time' value."
   (cond
-   ;; Just a year: [-+]YYYY.
+   ;; Just a year: [+-]YYYY.
    ((iso8601--match iso8601--year-match string)
     (iso8601--decoded-time
      :year (iso8601--adjust-year (match-string 1 string)
@@ -236,7 +236,7 @@ iso8601-parse-time
                            (string-to-number (match-string 2 time))))
               (second (and (match-string 3 time)
                            (string-to-number (match-string 3 time))))
-              (fraction (and (match-string 4 time)
+	      (fraction (and (not (zerop (length (match-string 4 time))))
                              (string-to-number (match-string 4 time)))))
           (when (and fraction
                      (eq form t))
diff --git a/lisp/language/burmese.el b/lisp/language/burmese.el
index 25425ec485..7f2a99a41a 100644
--- a/lisp/language/burmese.el
+++ b/lisp/language/burmese.el
@@ -39,11 +39,11 @@
 (defvar burmese-composable-pattern
   (let ((table
 	 '(("K" . "[\u1004\u105A]\u103A\u1039") ; KINZI sequence
-	   ("C" . "[\u1000-\u102A\u103F\u1041-\u1049\u104E\u105A-\u105D\u1061\u1065-\u1066\u106E\u1071\u1075\u1081\u108E\uAA60-\uAA6F\uAA71-\uAA76]") ; consonant and vowel letter
+	   ("C" . "[\u1000-\u102A\u103F\u1041-\u1049\u104E\u105A-\u105D\u1061\u1065\u1066\u106E\u1071\u1075\u1081\u108E\uAA60-\uAA6F\uAA71-\uAA76]") ; consonant and vowel letter
 	   ("V" . "\u1039")					   ; VIRAMA
 	   ("A" . "\u103A")					   ; ASAT
 	   ("S" . "[\u1000-\u1019\u101C\u101E\u1020\u1021\u105A]") ; subscript
-	   ("M" . "[\u103B-\u103E\105E-\1060]") ; medial
+	   ("M" . "[\u103B-\u103E\u105E-\u1060]") ; medial
 	   ("v" . "[\u102B-\u103A\u103C-\u103E\u1062-\u1064\u1067-\u106D\u1071-\u1074\u1082-\u108D\u108F\u109A\u109C\uAA70]"))) ; vowel sign, etc.
 	(regexp "\\(K\\)?C\\(VS\\)?\\(VS\\)?A?M*v*"))
     (let ((case-fold-search nil))
diff --git a/lisp/language/indian.el b/lisp/language/indian.el
index f1e61a354c..4013faca7c 100644
--- a/lisp/language/indian.el
+++ b/lisp/language/indian.el
@@ -139,14 +139,14 @@ devanagari-composable-pattern
   (let ((table
 	 '(("a" . "[\u0900-\u0902]")	; vowel modifier (above)
 	   ("A" . "\u0903")		; vowel modifier (post)
-	   ("V" . "[\u0904-\u0914\u0960-\u0961\u0972]") ; independent vowel
+	   ("V" . "[\u0904-\u0914\u0960\u0961\u0972]") ; independent vowel
 	   ("C" . "[\u0915-\u0939\u0958-\u095F\u0979-\u097F]") ; consonant
 	   ("R" . "\u0930")		; RA
 	   ("n" . "\u093C")		; NUKTA
-	   ("v" . "[\u093E-\u094C\u094E\u0955\u0962-\u0963]") ; vowel sign
+	   ("v" . "[\u093E-\u094C\u094E\u0955\u0962\u0963]") ; vowel sign
 	   ("H" . "\u094D")		; HALANT
-	   ("s" . "[\u0951-\u0952]")	; stress sign
-	   ("t" . "[\u0953-\u0954]")	; accent
+	   ("s" . "[\u0951\u0952]")	; stress sign
+	   ("t" . "[\u0953\u0954]")	; accent
 	   ("N" . "\u200C")		; ZWNJ
 	   ("J" . "\u200D")		; ZWJ
 	   ("X" . "[\u0900-\u097F]"))))	; all coverage
@@ -166,13 +166,13 @@ devanagari-composable-pattern
 (defconst bengali-composable-pattern
   (let ((table
 	 '(("a" . "\u0981")		; SIGN CANDRABINDU
-	   ("A" . "[\u0982-\u0983]")	; SIGN ANUSVARA .. VISARGA
-	   ("V" . "[\u0985-\u0994\u09E0-\u09E1]") ; independent vowel
+	   ("A" . "[\u0982\u0983]")	; SIGN ANUSVARA .. VISARGA
+	   ("V" . "[\u0985-\u0994\u09E0\u09E1]") ; independent vowel
 	   ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant
-	   ("B" . "[\u09AC\u09AF-\u09B0\u09F0]")		; BA, YA, RA
+	   ("B" . "[\u09AC\u09AF\u09B0\u09F0]")		; BA, YA, RA
 	   ("R" . "[\u09B0\u09F0]")		; RA
 	   ("n" . "\u09BC")		; NUKTA
-	   ("v" . "[\u09BE-\u09CC\u09D7\u09E2-\u09E3]") ; vowel sign
+	   ("v" . "[\u09BE-\u09CC\u09D7\u09E2\u09E3]") ; vowel sign
 	   ("H" . "\u09CD")		; HALANT
 	   ("T" . "\u09CE")		; KHANDA TA
 	   ("N" . "\u200C")		; ZWNJ
@@ -195,11 +195,11 @@ bengali-composable-pattern
 
 (defconst gurmukhi-composable-pattern
   (let ((table
-	 '(("a" . "[\u0A01-\u0A02\u0A70]") ; SIGN ADAK BINDI .. BINDI, TIPPI
+	 '(("a" . "[\u0A01\u0A02\u0A70]") ; SIGN ADAK BINDI .. BINDI, TIPPI
 	   ("A" . "\u0A03")		; SIGN VISARGA
 	   ("V" . "[\u0A05-\u0A14]")	; independent vowel
 	   ("C" . "[\u0A15-\u0A39\u0A59-\u0A5E]")	; consonant
-	   ("Y" . "[\u0A2F-u0A30\u0A35\u0A39]") ; YA, RA, VA, HA
+	   ("Y" . "[\u0A2F-\u0A30\u0A35\u0A39]") ; YA, RA, VA, HA
 	   ("n" . "\u0A3C")		; NUKTA
 	   ("v" . "[\u0A3E-\u0A4C]")	; vowel sign
 	   ("H" . "\u0A4D")		; VIRAMA
@@ -221,13 +221,13 @@ gurmukhi-composable-pattern
 
 (defconst gujarati-composable-pattern
   (let ((table
-	 '(("a" . "[\u0A81-\u0A82]")	; SIGN CANDRABINDU .. ANUSVARA
+	 '(("a" . "[\u0A81\u0A82]")	; SIGN CANDRABINDU .. ANUSVARA
 	   ("A" . "\u0A83")		; SIGN VISARGA
-	   ("V" . "[\u0A85-\u0A94\u0AE0-\u0AE1]") ; independent vowel
+	   ("V" . "[\u0A85-\u0A94\u0AE0\u0AE1]") ; independent vowel
 	   ("C" . "[\u0A95-\u0AB9]")	; consonant
 	   ("R" . "\u0AB0")		; RA
 	   ("n" . "\u0ABC")		; NUKTA
-	   ("v" . "[\u0ABE-\u0ACC\u0AE2-\u0AE3]") ; vowel sign
+	   ("v" . "[\u0ABE-\u0ACC\u0AE2\u0AE3]") ; vowel sign
 	   ("H" . "\u0ACD")		; VIRAMA
 	   ("N" . "\u200C")		; ZWNJ
 	   ("J" . "\u200D")		; ZWJ
@@ -248,13 +248,13 @@ gujarati-composable-pattern
 (defconst oriya-composable-pattern
   (let ((table
 	 '(("a" . "\u0B01")		; SIGN CANDRABINDU
-	   ("A" . "[\u0B02-\u0B03]")	; SIGN ANUSVARA .. VISARGA
-	   ("V" . "[\u0B05-\u0B14\u0B60-\u0B61]") ; independent vowel
-	   ("C" . "[\u0B15-\u0B39\u0B5C-\u0B5D\u0B71]")	; consonant
-	   ("B" . "[\u0B15-\u0B17\u0B1B-\u0B1D\u0B1F-\u0B21\u0B23-\u0B24\u0B27-\u0B30\u0B32-\u0B35\u0B38-\u0B39]") ; consonant with below form
+	   ("A" . "[\u0B02\u0B03]")	; SIGN ANUSVARA .. VISARGA
+	   ("V" . "[\u0B05-\u0B14\u0B60\u0B61]") ; independent vowel
+	   ("C" . "[\u0B15-\u0B39\u0B5C\u0B5D\u0B71]")	; consonant
+	   ("B" . "[\u0B15-\u0B17\u0B1B-\u0B1D\u0B1F-\u0B21\u0B23\u0B24\u0B27-\u0B30\u0B32-\u0B35\u0B38\u0B39]") ; consonant with below form
 	   ("R" . "\u0B30")		; RA
 	   ("n" . "\u0B3C")		; NUKTA
-	   ("v" . "[\u0B3E-\u0B4C\u0B56-\u0B57\u0B62-\u0B63]") ; vowel sign
+	   ("v" . "[\u0B3E-\u0B4C\u0B56\u0B57\u0B62\u0B63]") ; vowel sign
 	   ("H" . "\u0B4D")		; VIRAMA
 	   ("N" . "\u200C")		; ZWNJ
 	   ("J" . "\u200D")		; ZWJ
@@ -296,9 +296,9 @@ tamil-composable-pattern
 (defconst telugu-composable-pattern
   (let ((table
 	 '(("a" . "[\u0C01-\u0C03]")	; SIGN CANDRABINDU .. VISARGA
-	   ("V" . "[\u0C05-\u0C14\u0C60-\u0C61]") ; independent vowel
-	   ("C" . "[\u0C15-\u0C39\u0C58-\u0C59]") ; consonant
-	   ("v" . "[\u0C3E-\u0C4C\u0C55-\u0C56\u0C62-\u0C63]")	; vowel sign
+	   ("V" . "[\u0C05-\u0C14\u0C60\u0C61]") ; independent vowel
+	   ("C" . "[\u0C15-\u0C39\u0C58\u0C59]") ; consonant
+	   ("v" . "[\u0C3E-\u0C4C\u0C55\u0C56\u0C62\u0C63]")	; vowel sign
 	   ("H" . "\u0C4D")		; VIRAMA
 	   ("N" . "\u200C")		; ZWNJ
 	   ("J" . "\u200D")		; ZWJ
@@ -318,12 +318,12 @@ telugu-composable-pattern
 
 (defconst kannada-composable-pattern
   (let ((table
-	 '(("A" . "[\u0C82-\u0C83]")	; SIGN ANUSVARA .. VISARGA
-	   ("V" . "[\u0C85-\u0C94\u0CE0-\u0CE1]") ; independent vowel
-	   ("C" . "[\u0C95-\u0CB9\u0CDE]")	  ; consonant
+	 '(("A" . "[\u0C82\u0C83]")	; SIGN ANUSVARA .. VISARGA
+	   ("V" . "[\u0C85-\u0C94\u0CE0\u0CE1]") ; independent vowel
+	   ("C" . "[\u0C95-\u0CB9\u0CDE]")	 ; consonant
 	   ("R" . "\u0CB0")		; RA
 	   ("n" . "\u0CBC")		; NUKTA
-	   ("v" . "[\u0CBE-\u0CCC\u0CD5-\u0CD6\u0CE2-\u0CE3]") ; vowel sign
+	   ("v" . "[\u0CBE-\u0CCC\u0CD5\u0CD6\u0CE2\u0CE3]") ; vowel sign
 	   ("H" . "\u0CCD")		; VIRAMA
 	   ("N" . "\u200C")		; ZWNJ
 	   ("J" . "\u200D")		; ZWJ
@@ -343,11 +343,11 @@ kannada-composable-pattern
 
 (defconst malayalam-composable-pattern
   (let ((table
-	 '(("A" . "[\u0D02-\u0D03]")	; SIGN ANUSVARA .. VISARGA
-	   ("V" . "[\u0D05-\u0D14\u0D60-\u0D61]") ; independent vowel
+	 '(("A" . "[\u0D02\u0D03]")	; SIGN ANUSVARA .. VISARGA
+	   ("V" . "[\u0D05-\u0D14\u0D60\u0D61]")  ; independent vowel
 	   ("C" . "[\u0D15-\u0D39]")		  ; consonant
-	   ("Y" . "[\u0D2F-\u0D30\u0D32\u0D35]")  ; YA, RA, LA, VA
-	   ("v" . "[\u0D3E-\u0D4C\u0D57\u0D62-\u0D63]")	; postbase matra
+	   ("Y" . "[\u0D2F\u0D30\u0D32\u0D35]")   ; YA, RA, LA, VA
+	   ("v" . "[\u0D3E-\u0D4C\u0D57\u0D62\u0D63]")	; postbase matra
 	   ("H" . "\u0D4D")			  ; SIGN VIRAMA
 	   ("N" . "\u200C")			  ; ZWNJ
 	   ("J" . "\u200D")			  ; ZWJ
diff --git a/lisp/language/tibetan.el b/lisp/language/tibetan.el
index 4be25cecab..b42a1e8fb8 100644
--- a/lisp/language/tibetan.el
+++ b/lisp/language/tibetan.el
@@ -549,19 +549,16 @@ tibetan-precomposition-rule-alist
     ("à½¦à¾¨" . "ö†°")))
 
 (defconst tibetan-regexp
-  (let ((l (list tibetan-precomposed-transcription-alist
-		 tibetan-consonant-transcription-alist
-		 tibetan-vowel-transcription-alist
-		 tibetan-modifier-transcription-alist
-		 tibetan-subjoined-transcription-alist))
-	(separator "\\|")
-	tail pattern)
-    (while l
-      (setq tail (car l) l (cdr l))
-      (while tail
-	(setq pattern (cons separator (cons (car (car tail)) pattern))
-	      tail (cdr tail))))
-    (apply 'concat (nreverse (cdr pattern))))
+  (let (pattern)
+    (dolist (alist (list tibetan-precomposed-transcription-alist
+			 tibetan-consonant-transcription-alist
+			 tibetan-vowel-transcription-alist
+			 tibetan-modifier-transcription-alist
+			 tibetan-subjoined-transcription-alist)
+		   (apply #'concat (nreverse (cdr pattern))))
+      (dolist (key-val alist)
+	(setq pattern (cons "\\|" (cons (regexp-quote (car key-val))
+					pattern))))))
   "Regexp matching a Tibetan transcription of a composable Tibetan sequence.
 The result of matching is to be used for indexing alists at conversion
 from a roman transcription to the corresponding Tibetan character.")
diff --git a/lisp/textmodes/rst.el b/lisp/textmodes/rst.el
index 88c44c06da..b7438fbb10 100644
--- a/lisp/textmodes/rst.el
+++ b/lisp/textmodes/rst.el
@@ -388,8 +388,8 @@ rst-re-alist-def
 				        ; item tag.
 
     ;; Inline markup (`ilm')
-    (ilm-pfx (:alt "^" hws-prt "[-'\"([{<â€˜â€œÂ«â€™/:]"))
-    (ilm-sfx (:alt "$" hws-prt "[]-'\")}>â€™â€Â»/:.,;!?\\]"))
+    (ilm-pfx (:alt "^" hws-prt "['\"([{<â€˜â€œÂ«â€™/:-]"))
+    (ilm-sfx (:alt "$" hws-prt "[]'\")}>â€™â€Â»/:.,;!?\\-]"))
 
     ;; Inline markup content (`ilc')
     (ilcsgl-tag "\\S ") ; A single non-white character.
@@ -431,7 +431,7 @@ rst-re-alist-def
     (fld-tag ":" fldnam-tag ":") ; A field marker.
 
     ;; Options (`opt')
-    (optsta-tag (:alt "[-+/]" "--")) ; Start of an option.
+    (optsta-tag (:alt "[+/-]" "--")) ; Start of an option.
     (optnam-tag "\\sw" (:alt "-" "\\sw") "*") ; Name of an option.
     (optarg-tag (:shy "[ =]\\S +")) ; Option argument.
     (optsep-tag (:shy "," hws-prt)) ; Separator between options.
@@ -457,7 +457,7 @@ rst-re-alist-def
 				       ; tag.
 
     ;; Symbol (`sym')
-    (sym-prt "[-+.:_]") ; Non-word part of a symbol.
+    (sym-prt "[+.:_-]") ; Non-word part of a symbol.
     (sym-tag (:shy "\\sw+" (:shy sym-prt "\\sw+") "*"))
 
     ;; URIs (`uri')
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-04 21:42 ` Paul Eggert
@ 2019-10-05  8:10   ` Eli Zaretskii
  2019-10-05  9:37     ` Mattias Engdegård
                       ` (2 more replies)
  2019-10-05 10:03   ` Mattias Engdegård
  1 sibling, 3 replies; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-05  8:10 UTC (permalink / raw)
  To: Paul Eggert; +Cc: mattiase, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 4 Oct 2019 14:42:41 -0700
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> Thanks, I installed the attached patch, which I hope fixes all the bugs 
> and style glitches uncovered by that scan, along with some other style 
> glitches I noticed in the neighborhood.

I question the need for "fixing" those "style glitches" (and even the
very existence of a "style glitch", which sounds like a contradiction
of terms to me).

>  (defconst iso8601--year-match
> -  "\\([-+]\\)?\\([0-9][0-9][0-9][0-9]\\)")
> +  "\\([+-]\\)?\\([0-9][0-9][0-9][0-9]\\)")

What is the purpose of this and other similar changes?  AFAIK, both
variants are valid, so it sounds like your personal stylistic
preference is for the latter.  Is that the only reason?  If so, let's
please not make changes where this is the only reason, as doing so
risks breaking code (due to typos) and complicates forensics, and
otherwise serves no useful purpose.

>  (defconst tibetan-regexp
> -  (let ((l (list tibetan-precomposed-transcription-alist
> -		 tibetan-consonant-transcription-alist
> -		 tibetan-vowel-transcription-alist
> -		 tibetan-modifier-transcription-alist
> -		 tibetan-subjoined-transcription-alist))
> -	(separator "\\|")
> -	tail pattern)
> -    (while l
> -      (setq tail (car l) l (cdr l))
> -      (while tail
> -	(setq pattern (cons separator (cons (car (car tail)) pattern))
> -	      tail (cdr tail))))
> -    (apply 'concat (nreverse (cdr pattern))))
> +  (let (pattern)
> +    (dolist (alist (list tibetan-precomposed-transcription-alist
> +			 tibetan-consonant-transcription-alist
> +			 tibetan-vowel-transcription-alist
> +			 tibetan-modifier-transcription-alist
> +			 tibetan-subjoined-transcription-alist)
> +		   (apply #'concat (nreverse (cdr pattern))))
> +      (dolist (key-val alist)
> +	(setq pattern (cons "\\|" (cons (regexp-quote (car key-val))
> +					pattern))))))
>    "Regexp matching a Tibetan transcription of a composable Tibetan sequence.

This non-trivial change is documented as

    * lisp/language/tibetan.el (tibetan-regexp):
    Quote `+' in regexp to pacify the regexp scanner.  Simplify.

If the regexp scanner needs to be pacified, isn't it better to fix the
scanner instead?

I also don't think I see the simplification here.  In fact, the
original code looks simpler to me than the new one, as the former is
just a simple while loop, whereas the latter is a nested dolist.

Thanks.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05  8:10   ` Eli Zaretskii
@ 2019-10-05  9:37     ` Mattias Engdegård
  2019-10-05 10:49       ` Eli Zaretskii
  2019-10-05  9:52     ` Paul Eggert
  2019-10-05 16:59     ` Lars Ingebrigtsen
  2 siblings, 1 reply; 24+ messages in thread
From: Mattias Engdegård @ 2019-10-05  9:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Paul Eggert, emacs-devel

5 okt. 2019 kl. 10.10 skrev Eli Zaretskii <eliz@gnu.org>:

> If the regexp scanner needs to be pacified, isn't it better to fix the
> scanner instead?

The check is there because it is often useful. Relint/xr has been detecting and complaining about non-escaped use of special characters such as +, *, ?, ^ and $ for some time now, and for good reason: it's an error-prone exploitation of a hole in the syntax. We think that "*.^" is better written "\\*.\\^" because the latter is more regular, less likely to break when modified, and tells the reader that no, it isn't a mistake, the programmer knows what he is doing.

Such non-essential escaping has been added many times before, and it has never been controversial in the slightest.

> I also don't think I see the simplification here.  In fact, the
> original code looks simpler to me than the new one, as the former is
> just a simple while loop, whereas the latter is a nested dolist.

Actually the original was a nested pair of while loops, which indicates that it wasn't quite that simple.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05  9:37     ` Mattias Engdegård
@ 2019-10-05 10:49       ` Eli Zaretskii
  2019-10-05 15:16         ` Stefan Monnier
  0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-05 10:49 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: eggert, emacs-devel

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Sat, 5 Oct 2019 11:37:45 +0200
> Cc: Paul Eggert <eggert@cs.ucla.edu>, emacs-devel@gnu.org
> 
> 5 okt. 2019 kl. 10.10 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > If the regexp scanner needs to be pacified, isn't it better to fix the
> > scanner instead?
> 
> The check is there because it is often useful. Relint/xr has been detecting and complaining about non-escaped use of special characters such as +, *, ?, ^ and $ for some time now, and for good reason: it's an error-prone exploitation of a hole in the syntax. We think that "*.^" is better written "\\*.\\^" because the latter is more regular, less likely to break when modified, and tells the reader that no, it isn't a mistake, the programmer knows what he is doing.
> 
> Such non-essential escaping has been added many times before, and it has never been controversial in the slightest.

I'm not sure I understand: is there a technical reason for producing
what sounds like a false positive in these cases, or is this done with
some principle in mind?  If the latter, what is the underlying
principle?

In general, I'd rather we didn't flag valid constructs if we can, as
doing that is an annoyance for programmers.  In particular, I find
nothing wrong with "*.^", and I find "\\*.\\^" harder to read.

> > I also don't think I see the simplification here.  In fact, the
> > original code looks simpler to me than the new one, as the former is
> > just a simple while loop, whereas the latter is a nested dolist.
> 
> Actually the original was a nested pair of while loops, which indicates that it wasn't quite that simple.

It's quite simple when I read it.  So this appears to be a stylistic
preference again.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 10:49       ` Eli Zaretskii
@ 2019-10-05 15:16         ` Stefan Monnier
  2019-10-05 16:02           ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Monnier @ 2019-10-05 15:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Mattias Engdegård, eggert, emacs-devel

> In general, I'd rather we didn't flag valid constructs if we can, as
> doing that is an annoyance for programmers.  In particular, I find
> nothing wrong with "*.^", and I find "\\*.\\^" harder to read.

It's more error prone, e.g. if you later concat some prefix to the regexp.


        Stefan




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 15:16         ` Stefan Monnier
@ 2019-10-05 16:02           ` Eli Zaretskii
  0 siblings, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-05 16:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: mattiase, eggert, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Mattias Engdegård <mattiase@acm.org>,
>   eggert@cs.ucla.edu,
>   emacs-devel@gnu.org
> Date: Sat, 05 Oct 2019 11:16:04 -0400
> 
> > In general, I'd rather we didn't flag valid constructs if we can, as
> > doing that is an annoyance for programmers.  In particular, I find
> > nothing wrong with "*.^", and I find "\\*.\\^" harder to read.
> 
> It's more error prone, e.g. if you later concat some prefix to the regexp.

No one says such concatenation will ever happen.  Besides, any change
requires one to make sure the result is valid anyway.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05  8:10   ` Eli Zaretskii
  2019-10-05  9:37     ` Mattias Engdegård
@ 2019-10-05  9:52     ` Paul Eggert
  2019-10-05 10:59       ` Eli Zaretskii
  2019-10-05 16:59     ` Lars Ingebrigtsen
  2 siblings, 1 reply; 24+ messages in thread
From: Paul Eggert @ 2019-10-05  9:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, emacs-devel

On 10/5/19 1:10 AM, Eli Zaretskii wrote:
> both variants are valid, so it sounds like your personal stylistic
> preference is for the latter.  Is that the only reason?

No. The "Special Characters in Regular Expressions" subsubsection of the Elisp 
manual recommends putting '-' at the end of bracket expressions, since putting 
'-' elsewhere can cause confusion (one or two examples of which caused bugs that 
was fixed in that patch). The patch did not systematically put "-" at the end of 
every bracket expression, only those reasonably near bugs and other glitches.

> If the regexp scanner needs to be pacified, isn't it better to fix the
> scanner instead?

No, as the regexp scanner was right here: it complained about unescaped literal 
"+" in regexps, which is poor practice (and is documented as poor practice in 
the Elisp manual).

> the
> original code looks simpler to me than the new one, as the former is
> just a simple while loop, whereas the latter is a nested dolist.

Actually the old code was so confusing that it was a bit hard to see that it was 
a nested loop (not a simple while loop). I originally had the same confusion 
that you did, which is why I redid the loops to make the nesting more obvious 
and to lessen the number of lines of code.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05  9:52     ` Paul Eggert
@ 2019-10-05 10:59       ` Eli Zaretskii
  2019-10-05 15:20         ` Stefan Monnier
  2019-10-05 19:19         ` Paul Eggert
  0 siblings, 2 replies; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-05 10:59 UTC (permalink / raw)
  To: Paul Eggert; +Cc: mattiase, emacs-devel

> Cc: mattiase@acm.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 5 Oct 2019 02:52:34 -0700
> 
> On 10/5/19 1:10 AM, Eli Zaretskii wrote:
> > both variants are valid, so it sounds like your personal stylistic
> > preference is for the latter.  Is that the only reason?
> 
> No. The "Special Characters in Regular Expressions" subsubsection of the Elisp 
> manual recommends putting '-' at the end of bracket expressions, since putting 
> '-' elsewhere can cause confusion (one or two examples of which caused bugs that 
> was fixed in that patch). The patch did not systematically put "-" at the end of 
> every bracket expression, only those reasonably near bugs and other glitches.

The text in the manual says:

  To include a ‘-’, write ‘-’ as the first or last character of the
  character alternative, or as the upper bound of a range.

That's it.  And I personally can see no confusion in the likes of
"[-+]", whereas I did need to consult the manual to learn that "[+-]"
is also right.  So at least for me, the confusion worked the other way
around.

> > If the regexp scanner needs to be pacified, isn't it better to fix the
> > scanner instead?
> 
> No, as the regexp scanner was right here: it complained about unescaped literal 
> "+" in regexps, which is poor practice (and is documented as poor practice in 
> the Elisp manual).

Poor, but correct.

Look, this scanner is a good tool, but forcing a particular set of
stylistic preferences on the project by using that tool, let alone
silently so, is more than we have bid for.  If you have strong enough
opinions on that, let's discuss this first, document the agreements in
the manual, and only apply them after that.

> > the
> > original code looks simpler to me than the new one, as the former is
> > just a simple while loop, whereas the latter is a nested dolist.
> 
> Actually the old code was so confusing that it was a bit hard to see that it was 
> a nested loop (not a simple while loop). I originally had the same confusion 
> that you did, which is why I redid the loops to make the nesting more obvious 
> and to lessen the number of lines of code.

That's again a personal stylistic preference that I'd rather not
regard as a reason strong enough to change correct code.  If anything,
it runs the risk of introducing bugs that can lie low years before
they are detected.  If you think the code might be incorrect, find an
test case where it misbehaves, then modifying it will be justified.

In general, I find that we make too many changes whose justification
is weak at best and non-existent at worst.  This both wastes our
limited resources and risks introducing bugs for no good reason.
Let's try to restrain ourselves and not make changes just because we
can.  Emacs is an old and stable program; we should try not to
destabilize it unless we fix real bugs or introduce new features.

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 10:59       ` Eli Zaretskii
@ 2019-10-05 15:20         ` Stefan Monnier
  2019-10-05 16:03           ` Eli Zaretskii
  2019-10-05 19:19         ` Paul Eggert
  1 sibling, 1 reply; 24+ messages in thread
From: Stefan Monnier @ 2019-10-05 15:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, Paul Eggert, emacs-devel

> That's it.  And I personally can see no confusion in the likes of
> "[-+]", whereas I did need to consult the manual to learn that "[+-]"
> is also right.  So at least for me, the confusion worked the other way
> around.

As discussed here a while back, putting `-` as the last char is actually
the only choice which always works (because it can't be the first char
when that first char has to be `]`), so I recommend to always put `-` as
the last char.


        Stefan




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 15:20         ` Stefan Monnier
@ 2019-10-05 16:03           ` Eli Zaretskii
  2019-10-06 13:42             ` Stefan Monnier
  0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-05 16:03 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: mattiase, eggert, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Paul Eggert <eggert@cs.ucla.edu>,  mattiase@acm.org,  emacs-devel@gnu.org
> Date: Sat, 05 Oct 2019 11:20:24 -0400
> 
> > That's it.  And I personally can see no confusion in the likes of
> > "[-+]", whereas I did need to consult the manual to learn that "[+-]"
> > is also right.  So at least for me, the confusion worked the other way
> > around.
> 
> As discussed here a while back, putting `-` as the last char is actually
> the only choice which always works (because it can't be the first char
> when that first char has to be `]`), so I recommend to always put `-` as
> the last char.

That's unrelated.  I was talking about modifying _existing_ code, not
about writing new code.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 16:03           ` Eli Zaretskii
@ 2019-10-06 13:42             ` Stefan Monnier
  2019-10-06 18:01               ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Monnier @ 2019-10-06 13:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, eggert, emacs-devel

> That's unrelated.  I was talking about modifying _existing_ code, not
> about writing new code.

In my experience, a lot of new code is written by looking at old code
(witness all the quoted lambdas around that can be found in "new" code).


        Stefan




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-06 13:42             ` Stefan Monnier
@ 2019-10-06 18:01               ` Eli Zaretskii
  0 siblings, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-06 18:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: mattiase, eggert, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: eggert@cs.ucla.edu,  mattiase@acm.org,  emacs-devel@gnu.org
> Date: Sun, 06 Oct 2019 09:42:31 -0400
> 
> > That's unrelated.  I was talking about modifying _existing_ code, not
> > about writing new code.
> 
> In my experience, a lot of new code is written by looking at old code
> (witness all the quoted lambdas around that can be found in "new" code).

Which is why we have code reviews.  Witness the many messages by
yourself and others that comment on those practices we are unhappy
about.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 10:59       ` Eli Zaretskii
  2019-10-05 15:20         ` Stefan Monnier
@ 2019-10-05 19:19         ` Paul Eggert
  2019-10-05 19:31           ` Eli Zaretskii
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Eggert @ 2019-10-05 19:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, emacs-devel

On 10/5/19 3:59 AM, Eli Zaretskii wrote:
> The text in the manual says:
> 
>    To include a ‘-’, write ‘-’ as the first or last character of the
>    character alternative, or as the upper bound of a range.

That quote is from an older version of the manual, which was amended after some 
discussion on this very point. The manual now distinguishes more carefully 
between what's recommended, and what's valid even though it may cause confusion. 
The manual now recommends putting literal '-' at the end and says that putting 
it elsewhere is not the best style even when it is valid.

I take your point about not modifying Emacs just for the fun of it. However, 
when bugs are being fixed, it's OK if the person doing the fix takes the 
opportunity to make nearby code clearer, as the added clarity can increase the 
probability of the fix being correct and can simplify later maintenance. 
Although one can go overboard in the process and it's sometimes better to leave 
code alone even when it is confusing, generally speaking we should delegate 
these minor decisions to whoever's actually doing the work, in order to 
encourage bug-fixers rather than throw cold water on them.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 19:19         ` Paul Eggert
@ 2019-10-05 19:31           ` Eli Zaretskii
  2019-10-05 19:50             ` Paul Eggert
  0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-05 19:31 UTC (permalink / raw)
  To: Paul Eggert; +Cc: mattiase, emacs-devel

> Cc: mattiase@acm.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 5 Oct 2019 12:19:16 -0700
> 
> I take your point about not modifying Emacs just for the fun of it. However, 
> when bugs are being fixed, it's OK if the person doing the fix takes the 
> opportunity to make nearby code clearer, as the added clarity can increase the 
> probability of the fix being correct and can simplify later maintenance. 

Sorry, I disagree.  Especially since "nearby" was IMO interpreted very
leniently here.  We only agreed to do "nearby" fixes that basically
don't affect the code to be executed, such as whitespace changes, typo
corrections, removal of the 'register' qualifier, etc.

> Although one can go overboard in the process and it's sometimes better to leave 
> code alone even when it is confusing

The code in question was not confusing by any measure, not IMO.  And
what's confusing for someone could not be confusing for others.  E.g.,
I consider the new code in tibetan.el "confusing", so should I go
ahead and change it back to what it was before?  I don't want us to
start disputes over stylistic issues, so let's respect stylistic
preferences of whoever wrote the code, especially if changing them
means a real non-trivial code change.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 19:31           ` Eli Zaretskii
@ 2019-10-05 19:50             ` Paul Eggert
  2019-10-06 17:19               ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Eggert @ 2019-10-05 19:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, emacs-devel

On 10/5/19 12:31 PM, Eli Zaretskii wrote:
> I consider the new code in tibetan.el "confusing", so should I go
> ahead and change it back to what it was before?

Sure, feel free. It's certainly not worth arguing about. But don't lose the 
regexp-quote fix while you change it back.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 19:50             ` Paul Eggert
@ 2019-10-06 17:19               ` Eli Zaretskii
  2019-10-06 17:33                 ` Paul Eggert
  0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-06 17:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: mattiase, emacs-devel

> Cc: mattiase@acm.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 5 Oct 2019 12:50:59 -0700
> 
> On 10/5/19 12:31 PM, Eli Zaretskii wrote:
> > I consider the new code in tibetan.el "confusing", so should I go
> > ahead and change it back to what it was before?
> 
> Sure, feel free.

I'd like to try to avoid the need for that in the future, because it
leaves bad taste in people's mouth, and sometimes wastes time an
energy.

Can we please agree to make stylistic changes only if either (a) the
change is in the code which is being modified/fixed anyway, or (b) the
change doesn't affect the code produced from the modified source?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-06 17:19               ` Eli Zaretskii
@ 2019-10-06 17:33                 ` Paul Eggert
  2019-10-06 18:53                   ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Eggert @ 2019-10-06 17:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, emacs-devel

On 10/6/19 10:19 AM, Eli Zaretskii wrote:
> Can we please agree to make stylistic changes only if either (a) the
> change is in the code which is being modified/fixed anyway, or (b) the
> change doesn't affect the code produced from the modified source?

That's what I did in this particular instance. Obviously you disagree, but 
that's what I thought at the time, and still do. Let's just agree to disagree; 
it's not worth arguing about.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-06 17:33                 ` Paul Eggert
@ 2019-10-06 18:53                   ` Eli Zaretskii
  2019-10-06 19:19                     ` Paul Eggert
  0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-06 18:53 UTC (permalink / raw)
  To: Paul Eggert; +Cc: mattiase, emacs-devel

> Cc: mattiase@acm.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 6 Oct 2019 10:33:34 -0700
> 
> On 10/6/19 10:19 AM, Eli Zaretskii wrote:
> > Can we please agree to make stylistic changes only if either (a) the
> > change is in the code which is being modified/fixed anyway, or (b) the
> > change doesn't affect the code produced from the modified source?
> 
> That's what I did in this particular instance.

Was it (a) or (b), according to your interpretation?

> Obviously you disagree, but that's what I thought at the time, and
> still do. Let's just agree to disagree; it's not worth arguing
> about.

We cannot agree to disagree on this, because it means we will keep
bumping at this and arguing about such changes in the future.  I'm
trying to find a compromise that we all could live with.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-06 18:53                   ` Eli Zaretskii
@ 2019-10-06 19:19                     ` Paul Eggert
  2019-10-06 19:31                       ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Eggert @ 2019-10-06 19:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, emacs-devel

On 10/6/19 11:53 AM, Eli Zaretskii wrote:

> Was it (a) or (b), according to your interpretation?

Partly (a) and partly (b). Really, it doesn't matter.

> we will keep
> bumping at this and arguing about such changes in the future.

I'm not arguing about the change now. As I said, please feel free to revert 
whatever part of it that you dislike (though I suggest keeping the bug fixes). I 
don't think unimportant little cases like this are worth either your time or 
mine (though perhaps we'll have to disagree about that as well :-).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-06 19:19                     ` Paul Eggert
@ 2019-10-06 19:31                       ` Eli Zaretskii
  0 siblings, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2019-10-06 19:31 UTC (permalink / raw)
  To: Paul Eggert; +Cc: mattiase, emacs-devel

> Cc: mattiase@acm.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 6 Oct 2019 12:19:17 -0700
> 
> > we will keep
> > bumping at this and arguing about such changes in the future.
> 
> I'm not arguing about the change now. As I said, please feel free to revert 
> whatever part of it that you dislike (though I suggest keeping the bug fixes). I 
> don't think unimportant little cases like this are worth either your time or 
> mine (though perhaps we'll have to disagree about that as well :-).

Would you prefer that I asked you in similar cases in the future to
revert the relevant parts?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05  8:10   ` Eli Zaretskii
  2019-10-05  9:37     ` Mattias Engdegård
  2019-10-05  9:52     ` Paul Eggert
@ 2019-10-05 16:59     ` Lars Ingebrigtsen
  2019-10-05 18:52       ` Paul Eggert
  2 siblings, 1 reply; 24+ messages in thread
From: Lars Ingebrigtsen @ 2019-10-05 16:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, Paul Eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Thanks, I installed the attached patch, which I hope fixes all the bugs 
>> and style glitches uncovered by that scan, along with some other style 
>> glitches I noticed in the neighborhood.
>
> I question the need for "fixing" those "style glitches" (and even the
> very existence of a "style glitch", which sounds like a contradiction
> of terms to me).

What's confusing to me is why regexps in particular are being subjected
to these stylistic rewrites.  There's a bunch of styles being used in
Emacs Lisp code, but we avoid doing mass fixups (according to one
preference or another) of working code, because non-functional code
churn makes maintenance more difficult.

I mean, I prefer

(when foo
  (do-something))

over

(if foo
    (do-something))

or

(and foo (do-something))

but I hope that nobody would do a mass-rewrite of the Emacs code base
to prefer `when' here, because it'd make diving into the history of the
code a nightmare.

So what's so special about regexps that "[-+]" has to be mass-rewritten
as "[+-]"?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-05 16:59     ` Lars Ingebrigtsen
@ 2019-10-05 18:52       ` Paul Eggert
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Eggert @ 2019-10-05 18:52 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: mattiase, emacs-devel

On 10/5/19 9:59 AM, Lars Ingebrigtsen wrote:
> what's so special about regexps that "[-+]" has to be mass-rewritten
> as "[+-]"?

There's no mass rewriting here. I changed regexps near other (more-complicated) 
regexps that didn't put "-" at the end, regexps that were in fact incorrect 
evidently due to their authors' confusion. Using a consistent style in a 
trouble-prone area can help avoid future bugs when someone changes that code in 
the future.

If you add regexps to Emacs in the future, I suggest putting literal "-" at the 
end of bracket expressions, since that's the style recommended in the manual for 
a solid technical reason, namely to avoid all-too-common confusion in this area 
that causes real bugs. That is why the regexp scanner flags these "-" instances: 
it finds real bugs, some of which were fixed in the most-recent patch in this 
area. Of course there's a cost-benefit analysis to doing lint checking like 
this. In this particular case, the number of actual bugs and the simplicity of 
pacifying the lint checker mean that the benefits of using the checker outweigh 
the cost.

That being said, there's no pressing need to go through all the Emacs code and 
rewrite [-+] to [+-], just as there's no pressing need to go through all the 
Emacs code and remove the useless and distracting "register" keyword.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Emacs regexp scan (Sep 29)
  2019-10-04 21:42 ` Paul Eggert
  2019-10-05  8:10   ` Eli Zaretskii
@ 2019-10-05 10:03   ` Mattias Engdegård
  1 sibling, 0 replies; 24+ messages in thread
From: Mattias Engdegård @ 2019-10-05 10:03 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

4 okt. 2019 kl. 23.42 skrev Paul Eggert <eggert@cs.ucla.edu>:
> 
> Thanks, I installed the attached patch, which I hope fixes all the bugs and style glitches uncovered by that scan, along with some other style glitches I noticed in the neighborhood.

Thank you, I've reviewed it and found no errors. I also grepped for similar mistakes (missing \ or u in Unicode escapes) but didn't find anything else.

Just a small complaint remaining:

 (defconst gurmukhi-composable-pattern
[...]
-	   ("Y" . "[\u0A2F-u0A30\u0A35\u0A39]") ; YA, RA, VA, HA
+	   ("Y" . "[\u0A2F-\u0A30\u0A35\u0A39]") ; YA, RA, VA, HA

The hyphen can be removed.

(You can run the checker yourself by installing the `relint' package from GNU ELPA, and running
M-x relint-directory RET <Emacs source dir> RET.)




^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-10-06 19:31 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-09-29 19:39 Emacs regexp scan (Sep 29) Mattias Engdegård
2019-10-04 21:42 ` Paul Eggert
2019-10-05  8:10   ` Eli Zaretskii
2019-10-05  9:37     ` Mattias Engdegård
2019-10-05 10:49       ` Eli Zaretskii
2019-10-05 15:16         ` Stefan Monnier
2019-10-05 16:02           ` Eli Zaretskii
2019-10-05  9:52     ` Paul Eggert
2019-10-05 10:59       ` Eli Zaretskii
2019-10-05 15:20         ` Stefan Monnier
2019-10-05 16:03           ` Eli Zaretskii
2019-10-06 13:42             ` Stefan Monnier
2019-10-06 18:01               ` Eli Zaretskii
2019-10-05 19:19         ` Paul Eggert
2019-10-05 19:31           ` Eli Zaretskii
2019-10-05 19:50             ` Paul Eggert
2019-10-06 17:19               ` Eli Zaretskii
2019-10-06 17:33                 ` Paul Eggert
2019-10-06 18:53                   ` Eli Zaretskii
2019-10-06 19:19                     ` Paul Eggert
2019-10-06 19:31                       ` Eli Zaretskii
2019-10-05 16:59     ` Lars Ingebrigtsen
2019-10-05 18:52       ` Paul Eggert
2019-10-05 10:03   ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).