unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: Aaron Ecay <aaronecay@gmail.com>,
	Noam Postavsky <npostavs@users.sourceforge.net>,
	Emacs developers <emacs-devel@gnu.org>
Cc: Drew Adams <drew.adams@oracle.com>
Subject: Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?
Date: Sat, 3 Feb 2018 20:51:06 -0800	[thread overview]
Message-ID: <cb65574b-62e0-83c5-d1ad-c706632acd65@cs.ucla.edu> (raw)
In-Reply-To: <87shaigcvs.fsf@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

Aaron Ecay wrote:
> I was surprised to learn that this is the case, in light of what is
> said in the Elisp reference about symbol names

Good point; thanks. In the spirit of "be strict about what you generate", the 
Emacs printer should escape any character that is not in the list of characters 
documented in the Elisp manual as being safe (i.e., as not requiring escaping). 
This is elementary future-proofing, and is independent of whether we want Emacs 
to warn about or disallow confusable chars in symbols.

Proposed patches against 'master' attached. The first merely simplifes the code 
without changing its effect. The second fixes a bug in the manual, which 
incorrectly states that '?' never needs escaping in symbol names. These two 
patches are routine. (I assume the second one should be applied to emacs26 
instead of to master.)

The third patch changes the Lisp printer to escape characters as suggested above.

The fourth patch changes the Lisp printer to escape '?' only at the start of a 
symbol. This is nicer for programs using Scheme-style naming conventions in 
Emacs Lisp, e.g., 'fooish?' rather than 'fooishp'. I discovered the need for 
this patch when I wrote the second patch.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Simplify-print_object-a-bit.patch --]
[-- Type: text/x-patch; name="0001-Simplify-print_object-a-bit.patch", Size: 2780 bytes --]

From c03b816016f8cc2f15d275e7ad23448366489277 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 3 Feb 2018 20:29:00 -0800
Subject: [PATCH 1/4] Simplify print_object a bit

* src/print.c (print_object): Simplify by using C99
constructs, and by taking advantage of the fact that Lisp
strings are are followed by null bytes.
---
 src/print.c | 40 ++++++++++++++++------------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/src/print.c b/src/print.c
index b3c0f6f..d3eb49d 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1916,38 +1916,29 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
     case Lisp_Symbol:
       {
 	bool confusing;
-	unsigned char *p = SDATA (SYMBOL_NAME (obj));
-	unsigned char *end = p + SBYTES (SYMBOL_NAME (obj));
-	int c;
-	ptrdiff_t i, i_byte;
-	ptrdiff_t size_byte;
-	Lisp_Object name;
-
-	name = SYMBOL_NAME (obj);
-
-	if (p != end && (*p == '-' || *p == '+')) p++;
-	if (p == end)
-	  confusing = 0;
+	Lisp_Object name = SYMBOL_NAME (obj);
+	ptrdiff_t size_byte = SBYTES (name);
+	unsigned char *p = SDATA (name);
+	unsigned char *end = p + size_byte;
+
 	/* If symbol name begins with a digit, and ends with a digit,
 	   and contains nothing but digits and `e', it could be treated
 	   as a number.  So set CONFUSING.
 
-	   Symbols that contain periods could also be taken as numbers,
-	   but periods are always escaped, so we don't have to worry
-	   about them here.  */
-	else if (*p >= '0' && *p <= '9'
-		 && end[-1] >= '0' && end[-1] <= '9')
+	   Symbols that contain '.' or '#' could also be taken as
+	   numbers, but these are always escaped so don't worry about
+	   them here.  */
+	if (c_isdigit (p[*p == '-' || *p == '+']) && c_isdigit (end[-1]))
 	  {
-	    while (p != end && ((*p >= '0' && *p <= '9')
-				/* Needed for \2e10.  */
-				|| *p == 'e' || *p == 'E'))
+	    /* Check for 'e' too; needed for \2e10.  */
+	    do
 	      p++;
+	    while (c_isdigit (*p) || *p == 'e' || *p == 'E');
+
 	    confusing = (end == p);
 	  }
 	else
-	  confusing = 0;
-
-	size_byte = SBYTES (name);
+	  confusing = false;
 
 	if (! NILP (Vprint_gensym)
             && !SYMBOL_INTERNED_IN_INITIAL_OBARRAY_P (obj))
@@ -1958,10 +1949,11 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
 	    break;
 	  }
 
-	for (i = 0, i_byte = 0; i_byte < size_byte;)
+	for (ptrdiff_t i = 0, i_byte = 0; i_byte < size_byte;)
 	  {
 	    /* Here, we must convert each multi-byte form to the
 	       corresponding character code before handing it to PRINTCHAR.  */
+	    int c;
 	    FETCH_STRING_CHAR_ADVANCE (c, name, i, i_byte);
 	    maybe_quit ();
 
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Say-needs-escaping-at-start-of-symbol.patch --]
[-- Type: text/x-patch; name="0002-Say-needs-escaping-at-start-of-symbol.patch", Size: 1210 bytes --]

From 4b945a3fcbf6ff2bde4595dd8b8f472d1b3d17af Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 3 Feb 2018 20:30:21 -0800
Subject: [PATCH 2/4] Say ? needs escaping at start of symbol.

* doc/lispref/objects.texi: ? is also special.
---
 doc/lispref/objects.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/lispref/objects.texi b/doc/lispref/objects.texi
index af74062..f0420e6 100644
--- a/doc/lispref/objects.texi
+++ b/doc/lispref/objects.texi
@@ -557,7 +557,8 @@ Symbol Type
 of the name suffice as long as the name does not look like a number.
 (If it does, write a @samp{\} at the beginning of the name to force
 interpretation as a symbol.)  The characters @samp{_~!@@$%^&:<>@{@}?} are
-less often used but also require no special punctuation.  Any other
+less often used but also require no special punctuation, except that
+@samp{\} must precede @samp{?} at the start of a symbol.  Any other
 characters may be included in a symbol's name by escaping them with a
 backslash.  In contrast to its use in strings, however, a backslash in
 the name of a symbol simply quotes the single character that follows the
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0003-prin1-etc.-now-escape-more-chars-in-symbols.patch --]
[-- Type: text/x-patch; name="0003-prin1-etc.-now-escape-more-chars-in-symbols.patch", Size: 3147 bytes --]

From 2add3a1595f709bb071e2b775970038470b2fab2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 3 Feb 2018 20:30:48 -0800
Subject: [PATCH 3/4] prin1 etc. now escape more chars in symbols

Inspired by email from Aaron Ecay in:
https://lists.gnu.org/r/emacs-devel/2018-02/msg00125.html
* etc/NEWS: Mention this.
* src/print.c (print_object): Escape any character that is not
documented to not require escaping.
---
 etc/NEWS    |  7 +++++++
 src/print.c | 37 +++++++++++++++++++++++++++++++------
 2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/etc/NEWS b/etc/NEWS
index afd0fba..2a46002 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -87,6 +87,13 @@ regular expression was previously invalid, but is now accepted:
 
    x\{32768\}
 
+** 'print' and related functions now escape more chars in symbols.
+They now escape any symbol character that is outside the documented
+set of characters that do not need escaping.  For example, (print
+(intern "n\u0456l")) now outputs "n\іl" instead of "nіl", as a hint to
+the reader that the "і" is not the usual U+0069 LATIN SMALL LETTER I,
+but is instead U+0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I.
+
 \f
 * Editing Changes in Emacs 27.1
 
diff --git a/src/print.c b/src/print.c
index d3eb49d..7eca36a 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1959,12 +1959,37 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
 
 	    if (escapeflag)
 	      {
-		if (c == '\"' || c == '\\' || c == '\''
-		    || c == ';' || c == '#' || c == '(' || c == ')'
-		    || c == ',' || c == '.' || c == '`'
-		    || c == '[' || c == ']' || c == '?' || c <= 040
-                    || confusing
-		    || (i == 1 && confusable_symbol_character_p (c)))
+		switch (c)
+		  {
+		    /* The Emacs Lisp manual lists these characters as
+		       not requiring escaping in symbols.  Although some
+		       other characters might also work, play it safe
+		       and escape all but these characters.  */
+		  case '!': case '$': case '%': case '&':
+		  case '*': case '-': case '+': case '/':
+		  case '0': case '1': case '2': case '3': case '4':
+		  case '5': case '6': case '7': case '8': case '9':
+		  case ':': case '<': case '=': case '>': case '@':
+		  case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
+		  case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
+		  case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
+		  case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
+		  case 'Y': case 'Z':
+		  case '^': case '_':
+		  case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
+		  case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
+		  case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
+		  case 's': case 't': case 'u': case 'v': case 'w': case 'x':
+		  case 'y': case 'z':
+		  case '{': case '}': case '~':
+		    break;
+
+		  default:
+		    confusing = true;
+		    break;
+		  }
+
+		if (confusing)
 		  {
 		    printchar ('\\', printcharfun);
 		    confusing = false;
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: 0004-Escape-only-at-start-of-symbol.patch --]
[-- Type: text/x-patch; name="0004-Escape-only-at-start-of-symbol.patch", Size: 1913 bytes --]

From 4289ea136de4876b5dfc20d83b5a2556d1b5d8e6 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 3 Feb 2018 20:39:48 -0800
Subject: [PATCH 4/4] Escape ? only at start of symbol

* src/print.c (print_object): Do it.
---
 etc/NEWS    | 4 ++++
 src/print.c | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/etc/NEWS b/etc/NEWS
index 2a46002..c435136 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -94,6 +94,10 @@ set of characters that do not need escaping.  For example, (print
 the reader that the "і" is not the usual U+0069 LATIN SMALL LETTER I,
 but is instead U+0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I.
 
+** 'print' etc. no longer escape non-initial '?' in symbols.
+For example, the symbol 'list?' is now printed as-is.  Initial '?'
+is still escaped, e.g., (print (intern "?x")) still outputs "\?x".
+
 \f
 * Editing Changes in Emacs 27.1
 
diff --git a/src/print.c b/src/print.c
index 7eca36a..dfd6c50 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1938,7 +1938,7 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
 	    confusing = (end == p);
 	  }
 	else
-	  confusing = false;
+	  confusing = *p == '?';
 
 	if (! NILP (Vprint_gensym)
             && !SYMBOL_INTERNED_IN_INITIAL_OBARRAY_P (obj))
@@ -1969,7 +1969,7 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
 		  case '*': case '-': case '+': case '/':
 		  case '0': case '1': case '2': case '3': case '4':
 		  case '5': case '6': case '7': case '8': case '9':
-		  case ':': case '<': case '=': case '>': case '@':
+		  case ':': case '<': case '=': case '>': case '?': case '@':
 		  case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
 		  case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
 		  case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
-- 
2.7.4


  parent reply	other threads:[~2018-02-04  4:51 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-02 22:24 Change of Lisp syntax for "fancy" quotes in Emacs 27? Noam Postavsky
2018-02-02 22:52 ` Paul Eggert
2018-02-03  0:00   ` Drew Adams
2018-02-03  0:09     ` Paul Eggert
2018-02-03  0:39       ` Drew Adams
2018-02-03  8:33 ` Eli Zaretskii
2018-02-03 16:16   ` Drew Adams
2018-02-03 17:05     ` Eli Zaretskii
2018-02-04  1:16       ` Michael Heerdegen
2018-02-04  1:25         ` Clément Pit-Claudel
2018-02-04  2:05           ` Drew Adams
2018-02-04  2:06           ` Michael Heerdegen
2018-02-04 10:34           ` Alan Third
2018-02-04 15:36             ` Clément Pit-Claudel
2018-02-04 17:37               ` Eli Zaretskii
2018-02-04 21:31                 ` Noam Postavsky
2018-02-04 11:15         ` Alan Mackenzie
2018-02-04 15:54           ` Drew Adams
2018-02-04 14:47         ` Noam Postavsky
2018-02-04  1:55       ` Drew Adams
2018-02-04  2:10         ` Noam Postavsky
2018-02-05  1:06       ` Why "symbol's value" error about a list? Richard Stallman
2018-02-05 20:35         ` Alan Mackenzie
2018-02-05 21:46           ` Drew Adams
2018-02-06  4:13             ` Eli Zaretskii
2018-02-06  7:32               ` Tim Cross
2018-02-06  7:40                 ` Eli Zaretskii
2018-02-06 15:45                 ` Drew Adams
2018-02-06 15:45               ` Drew Adams
2018-02-06 19:17                 ` Eli Zaretskii
2018-02-06 14:51           ` Richard Stallman
2018-02-06 11:27         ` Noam Postavsky
2018-02-06 14:53           ` Richard Stallman
2018-02-06 18:59             ` Eli Zaretskii
2018-02-07  2:40               ` Richard Stallman
2018-02-07  3:42                 ` Eli Zaretskii
2018-02-06 18:52           ` Eli Zaretskii
2018-02-05  1:06       ` Change of Lisp syntax for "fancy" quotes in Emacs 27? Richard Stallman
2018-02-03 18:13 ` Aaron Ecay
2018-02-04  2:05   ` Drew Adams
2018-02-04  4:51   ` Paul Eggert [this message]
2018-02-04  9:47     ` Andreas Schwab
2018-02-04 15:04     ` Noam Postavsky
2018-02-04 17:33       ` Eli Zaretskii
2018-02-04 19:36         ` Paul Eggert
2018-02-04 19:55           ` Philipp Stephani
2018-02-04 20:10           ` Eli Zaretskii
2018-02-04 20:36             ` Eli Zaretskii
2018-02-04 20:48               ` Paul Eggert
2018-02-04 20:59                 ` Clément Pit-Claudel
2018-10-05  0:03 ` Noam Postavsky
2018-10-05  1:01   ` Paul Eggert
2018-10-05  8:43     ` Eli Zaretskii
2018-10-05 23:02       ` Paul Eggert
2018-10-06  0:20         ` Drew Adams
2018-10-06  9:14           ` Alan Mackenzie
2018-10-06 14:34             ` Stefan Monnier
2018-10-06 14:57             ` Drew Adams
2018-10-06 15:42               ` Garreau, Alexandre
2018-10-06 16:10             ` Paul Eggert
2018-10-06 16:17           ` Paul Eggert
2018-10-07  1:13             ` Drew Adams
2018-10-08  3:51             ` Richard Stallman
2018-10-06 10:11         ` Eli Zaretskii
2018-10-06 15:51           ` Paul Eggert
2018-10-06 16:45             ` Eli Zaretskii
2018-10-06 18:03               ` Paul Eggert
2018-10-06 18:29                 ` Eli Zaretskii
2018-10-06 19:18                   ` Paul Eggert
2018-10-06 19:30                   ` Paul Eggert
2018-10-06 19:32                   ` Garreau, Alexandre
2018-10-06 11:22         ` Garreau, Alexandre
2018-10-06 11:50           ` Eli Zaretskii
2018-10-06 12:10             ` Garreau, Alexandre
2018-10-06 14:00               ` Eli Zaretskii
2018-10-24 22:25                 ` Noam Postavsky
2018-10-06 13:15             ` Unicode security-issues workarounds elsewhere [Was: Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?] Garreau, Alexandre
2018-10-06 14:01               ` Eli Zaretskii
2018-10-06 16:24           ` Change of Lisp syntax for "fancy" quotes in Emacs 27? Paul Eggert
2018-10-06 16:40             ` Stefan Monnier
2018-10-09 14:43         ` Noam Postavsky
2018-10-09 15:30           ` Paul Eggert
2018-10-09 16:13             ` Eli Zaretskii
2018-10-09 17:07               ` Paul Eggert
2018-10-09 19:18                 ` Andreas Schwab
2018-10-10  9:39                   ` Aaron Ecay
2018-10-10 11:18                     ` Garreau, Alexandre
2018-10-10 14:31                       ` Eli Zaretskii
2018-10-10 15:18                   ` Eli Zaretskii
2018-10-10 15:43                     ` Drew Adams
2018-10-10 16:08                     ` Yuri Khan
2018-10-15 20:30                       ` Juri Linkov
2018-10-10  3:58                 ` Richard Stallman
2018-10-10  3:57           ` Richard Stallman
2018-10-10 14:41             ` Eli Zaretskii
2018-10-11  5:01               ` Richard Stallman
2018-10-06 15:40   ` eval-last-sexp / C-x C-e, and punctuation like `?’' [Was: Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?)] Garreau, Alexandre
2018-10-16 12:48   ` Change of Lisp syntax for "fancy" quotes in Emacs 27? Garreau, Alexandre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb65574b-62e0-83c5-d1ad-c706632acd65@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=aaronecay@gmail.com \
    --cc=drew.adams@oracle.com \
    --cc=emacs-devel@gnu.org \
    --cc=npostavs@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).