unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Noam Postavsky <npostavs@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 32939@debbugs.gnu.org
Subject: bug#32939: 27.0.50; Don't reject confusable quotes, just fontify & mention them in error messages
Date: Wed, 24 Oct 2018 18:25:53 -0400	[thread overview]
Message-ID: <877ei7m172.fsf@gmail.com> (raw)
In-Reply-To: <83woqw235n.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 05 Oct 2018 11:44:52 +0300")

[-- Attachment #1: Type: text/plain, Size: 705 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Noam Postavsky <npostavs@gmail.com>
>> Date: Thu, 04 Oct 2018 19:58:10 -0400
>> 
>> As a followup from [1], here is a patchset which replaces the master's
>> current rejection of confusable quotes on `read', and instead adds a
>> hint to the error message when evaluating an expression with a
>> confusable quote leads to an error.
>> 
>> And also highlight such quotes with warning-face.
>
> Thanks.  If this is accepted, it will need documentation changes,
> including NEWS.

I adjusted the first patch so it applies cleanly to more recent master,
and added a NEWS entry for the fontification.  I'm not sure about what
other documentation we need though.


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 6132 bytes --]

From 6d3b86eb41660d8908646d8461a7afff03fd17b7 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sat, 10 Mar 2018 18:20:45 -0500
Subject: [PATCH v2 1/2] Stop signaling an error when reading "smart quotes" in
 symbols

Revert commits from 2018-01-28 "Fix round tripping of read->print for
symbols with strange quotes", and 2017-07-22 "Signal error for symbol
names with strange quotes (Bug#2967)".
* etc/NEWS: Remove corresponding entries.
* src/character.c (confusable_symbol_character_p):
* test/src/lread-tests.el (lread-tests--old-style-backquotes): Remove.
* src/lread.c (read1): Don't signal error on confusable character.
* src/print.c (print_object): Don't escape confusable characters.
---
 etc/NEWS                |  9 ---------
 src/character.c         | 26 --------------------------
 src/character.h         |  2 --
 src/lread.c             |  7 -------
 src/print.c             |  3 +--
 test/src/lread-tests.el | 17 -----------------
 6 files changed, 1 insertion(+), 63 deletions(-)

diff --git a/etc/NEWS b/etc/NEWS
index 946a823173..8bed92430b 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1089,15 +1089,6 @@ and if the new behavior breaks your code please email
 32252@debbugs.gnu.org.  Because %o and %x can now format signed
 integers, they now support the + and space flags.
 
-** To avoid confusion caused by "smart quotes", the reader signals an
-error when reading Lisp symbols which begin with one of the following
-quotation characters: ‘’‛“”‟〞"'.  A symbol beginning with such a
-character can be written by escaping the quotation character with a
-backslash.  For example:
-
-    (read "‘smart") => (invalid-read-syntax "strange quote" "‘")
-    (read "\\‘smart") == (intern "‘smart")
-
 +++
 ** Omitting variables after '&optional' and '&rest' is now allowed.
 For example (defun foo (&optional)) is no longer an error.  This is
diff --git a/src/character.c b/src/character.c
index 0b14e476c1..edaeca0a35 100644
--- a/src/character.c
+++ b/src/character.c
@@ -1056,32 +1056,6 @@ blankp (int c)
   return XFIXNUM (category) == UNICODE_CATEGORY_Zs; /* separator, space */
 }
 
-
-/* Return true for characters that would read as symbol characters,
-   but graphically may be confused with some kind of punctuation.  We
-   require an escaping backslash, when such characters begin a
-   symbol.  */
-bool
-confusable_symbol_character_p (int ch)
-{
-  switch (ch)
-    {
-    case 0x2018: /* LEFT SINGLE QUOTATION MARK */
-    case 0x2019: /* RIGHT SINGLE QUOTATION MARK */
-    case 0x201B: /* SINGLE HIGH-REVERSED-9 QUOTATION MARK */
-    case 0x201C: /* LEFT DOUBLE QUOTATION MARK */
-    case 0x201D: /* RIGHT DOUBLE QUOTATION MARK */
-    case 0x201F: /* DOUBLE HIGH-REVERSED-9 QUOTATION MARK */
-    case 0x301E: /* DOUBLE PRIME QUOTATION MARK */
-    case 0xFF02: /* FULLWIDTH QUOTATION MARK */
-    case 0xFF07: /* FULLWIDTH APOSTROPHE */
-      return true;
-
-    default:
-      return false;
-    }
-}
-
 signed char HEXDIGIT_CONST hexdigit[UCHAR_MAX + 1] =
   {
 #if HEXDIGIT_IS_CONST
diff --git a/src/character.h b/src/character.h
index 5dff85aed4..6d5349e07d 100644
--- a/src/character.h
+++ b/src/character.h
@@ -683,8 +683,6 @@ char_surrogate_p (int c)
 extern bool printablep (int);
 extern bool blankp (int);
 
-extern bool confusable_symbol_character_p (int ch);
-
 /* Return a translation table of id number ID.  */
 #define GET_TRANSLATION_TABLE(id) \
   (XCDR (XVECTOR (Vtranslation_table_vector)->contents[(id)]))
diff --git a/src/lread.c b/src/lread.c
index 62616cb681..bf06b41caa 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3543,13 +3543,6 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list)
 	    if (! NILP (result) && len == nbytes)
 	      return unbind_to (count, result);
 	  }
-        if (!quoted && multibyte)
-          {
-            int ch = STRING_CHAR ((unsigned char *) read_buffer);
-            if (confusable_symbol_character_p (ch))
-              xsignal2 (Qinvalid_read_syntax, build_string ("strange quote"),
-                        CALLN (Fstring, make_fixnum (ch)));
-          }
 	{
 	  Lisp_Object result;
 	  ptrdiff_t nchars
diff --git a/src/print.c b/src/print.c
index d15ff97b00..3a19bd5e31 100644
--- a/src/print.c
+++ b/src/print.c
@@ -2030,8 +2030,7 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
 		    || c == ',' || c == '.' || c == '`'
 		    || c == '[' || c == ']' || c == '?' || c <= 040
 		    || c == NO_BREAK_SPACE
-                    || confusing
-		    || (i == 1 && confusable_symbol_character_p (c)))
+                    || confusing)
 		  {
 		    printchar ('\\', printcharfun);
 		    confusing = false;
diff --git a/test/src/lread-tests.el b/test/src/lread-tests.el
index f19d98320a..0d7cd962bd 100644
--- a/test/src/lread-tests.el
+++ b/test/src/lread-tests.el
@@ -142,23 +142,6 @@ lread-tests--last-message
                            "unescaped character literals "
                            "`?\"', `?(', `?)', `?;', `?[', `?]' detected!")))))
 
-(ert-deftest lread-tests--funny-quote-symbols ()
-  "Check that 'smart quotes' or similar trigger errors in symbol names."
-  (dolist (quote-char
-           '(#x2018 ;; LEFT SINGLE QUOTATION MARK
-             #x2019 ;; RIGHT SINGLE QUOTATION MARK
-             #x201B ;; SINGLE HIGH-REVERSED-9 QUOTATION MARK
-             #x201C ;; LEFT DOUBLE QUOTATION MARK
-             #x201D ;; RIGHT DOUBLE QUOTATION MARK
-             #x201F ;; DOUBLE HIGH-REVERSED-9 QUOTATION MARK
-             #x301E ;; DOUBLE PRIME QUOTATION MARK
-             #xFF02 ;; FULLWIDTH QUOTATION MARK
-             #xFF07 ;; FULLWIDTH APOSTROPHE
-             ))
-    (let ((str (format "%cfoo" quote-char)))
-     (should-error (read str) :type 'invalid-read-syntax)
-     (should (eq (read (concat "\\" str)) (intern str))))))
-
 (ert-deftest lread-test-bug26837 ()
   "Test for https://debbugs.gnu.org/26837 ."
   (let ((load-path (cons
-- 
2.11.0


[-- Attachment #3: patch --]
[-- Type: text/plain, Size: 7934 bytes --]

From acb2e416688dd3612225aa593c264f37edb5ba9f Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sat, 10 Mar 2018 18:12:55 -0500
Subject: [PATCH v2 2/2] Improve errors & warnings due to fancy quoted vars
 (Bug#32939)

Add some hints to the message for byte compiler free & unused variable
warnings, and 'void-variable' errors where the variable has confusable
quote characters in it.
* lisp/help.el (uni-confusables), uni-confusables-regexp): New
constants.
(help-command-error-confusable-suggestions): New function, added to
`command-error-function'.
(help-uni-confusable-suggestions): New function.
* lisp/emacs-lisp/bytecomp.el (byte-compile-variable-ref):
* lisp/emacs-lisp/cconv.el (cconv--analyze-use): Use it.

* lisp/emacs-lisp/lisp-mode.el
(lisp--match-confusable-symbol-character): New function.
(lisp-fdefs): Use it to fontify confusable characters with
font-lock-warning-face when they occur in symbol names.
* etc/NEWS: Announce the new fontification behavior.
---
 etc/NEWS                     |  4 ++++
 lisp/emacs-lisp/bytecomp.el  | 10 +++++++--
 lisp/emacs-lisp/cconv.el     |  6 ++++--
 lisp/emacs-lisp/lisp-mode.el | 16 ++++++++++++++-
 lisp/help.el                 | 49 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/etc/NEWS b/etc/NEWS
index 8bed92430b..3f86195695 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1089,6 +1089,10 @@ and if the new behavior breaks your code please email
 32252@debbugs.gnu.org.  Because %o and %x can now format signed
 integers, they now support the + and space flags.
 
+** In Emacs Lisp mode, symbols with confusable quotes are highlighted.
+For example, the first character in '‘foo' would be highlighted in
+'font-lock-warning-face'.
+
 +++
 ** Omitting variables after '&optional' and '&rest' is now allowed.
 For example (defun foo (&optional)) is no longer an error.  This is
diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
index 8b67e1007d..31d589ab02 100644
--- a/lisp/emacs-lisp/bytecomp.el
+++ b/lisp/emacs-lisp/bytecomp.el
@@ -3407,7 +3407,10 @@ byte-compile-variable-ref
 		  (boundp var)
 		  (memq var byte-compile-bound-variables)
 		  (memq var byte-compile-free-references))
-	(byte-compile-warn "reference to free variable `%S'" var)
+        (let* ((varname (prin1-to-string var))
+               (suggestions (help-uni-confusable-suggestions varname)))
+          (byte-compile-warn "reference to free variable `%s'%s" varname
+                             (if suggestions (concat "\n  " suggestions) "")))
 	(push var byte-compile-free-references))
       (byte-compile-dynamic-variable-op 'byte-varref var))))
 
@@ -3423,7 +3426,10 @@ byte-compile-variable-set
 		  (boundp var)
 		  (memq var byte-compile-bound-variables)
 		  (memq var byte-compile-free-assignments))
-	(byte-compile-warn "assignment to free variable `%s'" var)
+        (let* ((varname (prin1-to-string var))
+               (suggestions (help-uni-confusable-suggestions varname)))
+          (byte-compile-warn "assignment to free variable `%s'%s" varname
+                             (if suggestions (concat "\n  " suggestions) "")))
 	(push var byte-compile-free-assignments))
       (byte-compile-dynamic-variable-op 'byte-varset var))))
 
diff --git a/lisp/emacs-lisp/cconv.el b/lisp/emacs-lisp/cconv.el
index 010026b416..b62c69868b 100644
--- a/lisp/emacs-lisp/cconv.el
+++ b/lisp/emacs-lisp/cconv.el
@@ -591,8 +591,10 @@ cconv--analyze-use
               (eq ?_ (aref (symbol-name var) 0))
 	      ;; As a special exception, ignore "ignore".
 	      (eq var 'ignored))
-       (byte-compile-warn "Unused lexical %s `%S'"
-                          varkind var)))
+       (let ((suggestions (help-uni-confusable-suggestions (symbol-name var))))
+         (byte-compile-warn "Unused lexical %s `%S'%s"
+                            varkind var
+                            (if suggestions (concat "\n  " suggestions) "")))))
     ;; If it's unused, there's no point converting it into a cons-cell, even if
     ;; it's captured and mutated.
     (`(,binder ,_ t t ,_)
diff --git a/lisp/emacs-lisp/lisp-mode.el b/lisp/emacs-lisp/lisp-mode.el
index f31b4d4dd2..87bdafa976 100644
--- a/lisp/emacs-lisp/lisp-mode.el
+++ b/lisp/emacs-lisp/lisp-mode.el
@@ -280,6 +280,17 @@ elisp--font-lock-backslash
          `(face ,font-lock-warning-face
                 help-echo "This \\ has no effect"))))
 
+(defun lisp--match-confusable-symbol-character  (limit)
+  ;; Match a confusable character within a Lisp symbol.
+  (catch 'matched
+    (while t
+      (if (not (re-search-forward uni-confusables-regexp limit t))
+          (throw 'matched nil)
+        ;; Skip confusables inside strings or comments.
+        (save-match-data
+          (if (not (nth 8 (syntax-ppss)))
+              (throw 'matched t)))))))
+
 (let-when-compile
     ((lisp-fdefs '("defmacro" "defun"))
      (lisp-vdefs '("defvar"))
@@ -463,7 +474,10 @@ elisp--font-lock-backslash
            (3 'font-lock-regexp-grouping-construct prepend))
          (lisp--match-hidden-arg
           (0 '(face font-lock-warning-face
-               help-echo "Hidden behind deeper element; move to another line?")))
+                    help-echo "Hidden behind deeper element; move to another line?")))
+         (lisp--match-confusable-symbol-character
+          0 '(face font-lock-warning-face
+                    help-echo "Confusable character"))
          ))
       "Gaudy level highlighting for Emacs Lisp mode.")
 
diff --git a/lisp/help.el b/lisp/help.el
index 6e06fc9e1c..2cc28fca24 100644
--- a/lisp/help.el
+++ b/lisp/help.el
@@ -1489,6 +1489,55 @@ help--make-usage-docstring
     (help--docstring-quote (format "%S" (help--make-usage fn arglist)))))
 
 \f
+
+;; Just some quote-like characters for now.  TODO: generate this stuff
+;; from official Unicode data.
+(defconst uni-confusables
+  '((#x2018 . "'") ;; LEFT SINGLE QUOTATION MARK
+    (#x2019 . "'") ;; RIGHT SINGLE QUOTATION MARK
+    (#x201B . "'") ;; SINGLE HIGH-REVERSED-9 QUOTATION MARK
+    (#x201C . "\"") ;; LEFT DOUBLE QUOTATION MARK
+    (#x201D . "\"") ;; RIGHT DOUBLE QUOTATION MARK
+    (#x201F . "\"") ;; DOUBLE HIGH-REVERSED-9 QUOTATION MARK
+    (#x301E . "\"") ;; DOUBLE PRIME QUOTATION MARK
+    (#xFF02 . "'") ;; FULLWIDTH QUOTATION MARK
+    (#xFF07 . "'") ;; FULLWIDTH APOSTROPHE
+    ))
+
+(defconst uni-confusables-regexp
+  (concat "[" (mapcar #'car uni-confusables) "]"))
+
+(defun help-uni-confusable-suggestions (string)
+  "Return a message describing confusables in STRING."
+  (let ((i 0)
+        (confusables nil))
+    (while (setq i (string-match uni-confusables-regexp string i))
+      (let ((replacement (alist-get (aref string i) uni-confusables)))
+        (push (aref string i) confusables)
+        (setq string (replace-match replacement t t string))
+        (setq i (+ i (length replacement)))))
+    (when confusables
+      (format-message
+       (if (> (length confusables) 1)
+           "Found confusable characters: %s; perhaps you meant: `%s'?"
+         "Found confusable character: %s, perhaps you meant: `%s'?")
+       (mapconcat (lambda (c) (format-message "`%c'" c))
+                  confusables ", ")
+       string))))
+
+(defun help-command-error-confusable-suggestions (data _context _signal)
+  (pcase data
+    (`(void-variable ,var)
+     (let ((suggestions (help-uni-confusable-suggestions
+                         (symbol-name var))))
+       (when suggestions
+         (princ (concat "\n  " suggestions) t))))
+    (_ nil)))
+
+(add-function :after command-error-function
+              #'help-command-error-confusable-suggestions)
+
+\f
 (provide 'help)
 
 ;;; help.el ends here
-- 
2.11.0


  reply	other threads:[~2018-10-24 22:25 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-04 23:58 bug#32939: 27.0.50; Don't reject confusable quotes, just fontify & mention them in error messages Noam Postavsky
2018-10-05  8:44 ` Eli Zaretskii
2018-10-24 22:25   ` Noam Postavsky [this message]
2018-10-27  9:48     ` Eli Zaretskii
2018-10-30  1:30       ` Noam Postavsky
2018-10-30  7:12         ` Eli Zaretskii
2019-11-17  8:42           ` Lars Ingebrigtsen
2019-11-27 19:42             ` Noam Postavsky
2019-11-28 23:25               ` Noam Postavsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877ei7m172.fsf@gmail.com \
    --to=npostavs@gmail.com \
    --cc=32939@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).